Last updated: 07.17.2025
Published: 07.17.2025

Video and transcript of talk on “Can goodness compete?”

(This is the video and transcript of a public talk I gave at Mox in San Francisco in July 2025, on long-term equilibria post-AGI. It’s a longer version of the talk I gave at this workshop. The slides are also available here.)

Introduction

Thank you. Okay. Hi. Thanks for coming. 

Aims for this talk

So: can goodness compete? It’s a classic question, and it crops up constantly in a certain strand of futurism, so I’m going to try to analyze it and understand it more precisely. And in particular I want to distinguish between a few different variants, some of which are more fundamental problems than others. And then I want to try to hone in on what I see as the hardest version of the problem – and in particular, possible ways good values can have inherent disadvantages and competition with other value systems. For example, locust-like value systems that only care about growing and consuming resources as fast as possible. So that’s what I’m going to say is the hardest problem. But there’s a bunch of other versions of the problem that I want to talk about too, and I want to stare at some of the scary trade-offs that I think this problem can create. So that’s the broad plan here.

Basic vibe

To say a little bit more about what I’m talking about, there’s this broad vibe. Now this is not going to be very precise, but I want to give you the vibe and then we can get into the nitty-gritty. So the vibe is something like this, competition. Well, what wins competition? Power. And power is famously, unfortunately, not the same as goodness. So implication: maybe, every opportunity to sacrifice goodness for the sake of power makes you more likely to win. So if the future involves a lot of competition, then goodness loses. That’s the vibe.

Now, noticeably, this is all very, very abstract. What do these words mean: winning, competition, goodness? Let’s be wary of the degree of abstraction here. Let’s stay on our toes. It’s going to be an abstract talk. It’s going to be a weird futurism talk. There’s going to be a lot of stuff that isn’t pinned down. And that’s not good. I’m not saying that’s good. But work with me and see if we can do better together.

Lineage of concern

Okay. So there’s a certain kind of lineage of futurism that I think is just pervasively concerned about variants of “can goodness compete?” I put a bunch of citations here. I’m actually excluding the entire edifice of technical AI alignment. I’m going to talk about that in a second. But there’s a whole extra strand descended from the Future of Humanity Institute and prior to that, to some extent, that I think is concerned about the problem that I’m going to talk about here. And we see it cropping up all over the place in the contemporary AI discourse as well. Notably also, I think if you see people talk about, oh, AI, it’s just capitalism or it’s somehow the same as capitalism, I think that’s actually often expressive of a certain variant of the concern that I’m going to be talking about here as well.

What I mean by goodness

Okay, so what do I mean by goodness? Well, you know what I mean. Maybe you don’t. Maybe I don’t know what I mean. But it doesn’t matter that much, I’m hoping. So the main point is this thing where power is not it, or at least not it conceptually. So we can at least admit conceptually that might is distinct from right — like, these could be different. They’re not the same concepts. Empirically, maybe they come together in some ways. Maybe there’s some fancy conceptual analysis you can do to get from one to the other, but starting point, they’re different. Okay, so that’s the main thing now.

But in particular to give you some flavor of the type of thing we’re worried about here, so classic aspects of goodness to people are worried about include: consciousness, often thought good in certain cases; pleasure; beauty,; leisure; concern for the weak; prevention of suffering; adequate reflection on the good itself. That one’s a little meta, but…

Okay, and so the basic nightmare here is you have some sort of hyper competition — who knows what that means – but there’s hyper competition and it grinds away all these things because in some sense, these things are inefficient. So you know how if you work at a really intense job, then they don’t give you vacation maybe, or they give you only just enough vacation? That’s the kind of thing, but that for consciousness. It’s like consciousness, that’s vacation, strip it away. Find the unconscious workers who don’t need consciousness, that kind of thing, or at least that’s one variant.

Okay, so I’m going to be focusing particularly on the long-term future of goodness. So if you’re only concerned about what happens in the next 100 years, that’s up for grabs.

I want to talk about the very long term, and I’m going to make an assumption — oft made, oft unquestioned — that the good futures require at least some decent empowerment of agents optimizing for good values. So the thought is that kind of source of goodness ultimately needs to be people trying to make stuff good, or at least caring about good stuff. Goodness has to in the limit come from agents that are making it happen in the world. Now that is also not obvious, and I’m going to talk a little bit at the end, maybe about ways it might be not obvious. But I’m going to start with that assumption and go from there. Okay.

And then if you have burning questions, you’re like, “That…” I don’t know, something that really needs answering, then feel free to jump in. But other than that, we can wait. Okay, burning questions? Okay. Okay.

What I mean by competition

So what do I mean by competition? I don’t know.

So things like that are evolution, war, war, solid. That one, definitely that one. Competition between firms, capitalism something, mimetic competition, I’m not going to try to define it. But here’s some things you could throw in there if you were trying to define it. I don’t know, but things you could consider. There’s probably some mechanism for searching over competitors or competitor strategies that could be intentional strategization by agents or it could be natural variation in diversity and then some sort of selection. It’s probably got to be some scarcity, zero-sum vibe to the dynamic. I think there needs to be agent-ishness on both sides. So suppose you had some rocks and they collide and the bigger one wins, it’s still there. Is that competition? No, probably not. It’s not paradigm competition.

Also, if it’s random, if it’s just like there’s a bunch of randomness that doesn’t feel like competition, like War. You guys ever play War with the card game? That game sucks. In particular, there’s no hierarchy of War players. You don’t have elo for War.

And then also I’m particularly interested in competition where it builds on itself. Winning builds on itself at least to increase power and influence going forward. That’s actually an important dynamic here.

So that’s the sort of stuff I’m talking about. I don’t know exactly what I mean. Okay.

Can *humans* compete?

So here’s a less fundamental concern, but it’s one that’s very popular nowadays and the question is whether *humans* can compete. And the answer is no. Humans cannot compete, at least in the following sense. Post-AGI, in the limit of technological possibility, biological human labor is going to be wildly uncompetitive relative to other forms of labor. I think that just really looks true. I think some people are not grappling with this. “something, something, everyone wants their priests to be human. Everyone wants human poets or something,” everyone being humans who still have power apparently to pay for stuff in this world.

In the limit, I think humans are very unlikely for almost all forms of labor to be the optimally efficient and competitive doer of that labor. And I think we need to just look at that. If you think that good futures require biological human labor to still be playing an important role in the economy in the long term, well, there are going to have to be some serious massive constraints on competition to make that happen. And I don’t actually personally think that that’s the right path in the long term. I’m not going to get into that too much here, but I think there’s a bunch of reasons for that. Regardless, though, I think near-term proposals for what happens with human labor as we transition to a world of advanced AI, I think should grapple with this long term. There’s a long-term fact here about whether biological humans are competitive laborers that we need to stare at. Okay, so that’s that. But that doesn’t mean just because biological humans aren’t going to be competitive as laborers doesn’t mean that human values aren’t competitive in the longer term.

Some distinctions

So let me say what I mean by that. And to do that, we’re going to draw some distinctions. In particular, we’re going to distinguish between three things: there’s labor, there’s the values directing the labor, and there’s the valued stuff. So an example here, I suppose you hire a security guard to protect your kids while they sleep — or while they’re awake, actually. The labor is the security guard, the values directing the labor are your values and your care for your kids. And then the object of your values is your kids. So often in the current world, these things come closely together. So many workers are working on behalf of their values, they’re laboring, and their values are helping themselves and their own welfare. But these don’t need to be the same.

And in fact, so a common vision, though, I think this is problematic in a zillion ways and rests on all sorts of rich ontology we can question is, but there’s a common vision of a post-AGI world where the labor is the AIs, but the values directing the labor is those are somehow still human values. It’s like the human utility function, whatever that means. And then there’s the stuff that is valued by that utility function. And so somehow you’ve carved off, you took the values out of the human head and then you got the hypercompetitive AIs optimizing for those values and importantly… And then you got that payload of those values and that’s separate, that’s off… So there’s a picture, and this is a weird picture of the future, but maybe everyone gets an AI assistant. That AI assistant is hypercompetitive. The AI is out there making money, fighting wars for you, protecting you, boop, boop, boop.

Then it comes back and it’s like “I got these resources for you.” What are you going to do with them? And then you go and use your resources over there with your compute cluster and you do your nice things with those over there separately from the labor. That’s the broad vibe. Now there’s a lot of weird stuff about that, but that’s one way in which you could imagine human values remaining competitive without human labor being competitive. Okay, questions about that ontology. This is a really messed up vision a bunch of ways. I think it’s like, I don’t know, it’s kind of alien, what are we even talking about? But I think there’s nevertheless a core there that I think is worth questioning and I want to go deeper on suppose that broad vibe works. It’s possible to have intense optimization power directed at human values in a way that screens off the kind of uncompetitiveness of human labor. Then do we still have a problem about goodness being competitive? And I think we might indeed still have that problem.

Alignment taxes

Okay, so here’s another less fundamental concern, one that comes up in the context of the image I just gave: technical AI alignment. Maybe you’ve heard of it. It’s a concern and, in particular, it can be understood as one variant of the concern that goodness is not competitive. In particular, one focused on a *way* goodness could be not competitive. Namely, it’s hard to point AI optimization power at goodness for some reason. Maybe it’s hard to point AI optimization power at anything because I don’t know, it’s just hard to point at stuff. Or maybe goodness is particularly hard ’cause I don’t know, it’s hard to measure? People have often thought goodness is hard to measure. Clicks, maybe you can measure clicks, easier to optimize for them. Goodness, somehow harder. This is a thought people have had. Okay. So this is one version of goodness being uncompetitive, but it’s not the one I’m going to focus on.

In particular, so you can understand there’s an alignment tax which is like there’s an extra disadvantage you have from having to point your AI at goodness? It’s extra hard. Suppose we get rid of those taxes. There’s another form of possible tax which comes from the content of the values. And I think this is an importantly different form of alignment tax. So for example, deontological constraints, that’s in the values that you pointed the AI at and those themselves could be a competitive disadvantage in some respects. I’m going to talk about that kind of thing. But I want to distinguish between those types of alignment taxes. And I want to say that even if you got rid of all of the alignment taxes that technical AI alignment is focused on, you still have problems.

Andin fact, in my head, I don’t know if people have read papers like Gradual Disempowerment or people who will be like, “Ah, there’s…” People have an intuition, there’s some sort of thing that’s still scary even if you in some sense solved alignment. I think the cleanest way of framing that concern is in this way, that there’s other types of taxes, other types of competitive dynamics that bite even if you’ve gotten rid of traditional alignment taxes in the context of technical AI safety. Okay, questions about that? So that’s the special category of thing I want to focus on here. Okay.

More fundamental variants

Now I want to distinguish between two variants of this concern, which I think are too often conflated. So one is negative sum dynamics in which competition leads to a result that is worse for all participants than some available alternative. That is distinct from this other category that I’m going to focus on, which I’m calling failures of the strategy-stealing assumption. I’ll say more about what I mean. But roughly speaking, this is a case where competition disadvantages good values even in the absence of negative sum dynamics.

These are really importantly different I think. I don’t know, if you read Meditations on Moloch, which is a classic essay about this, I think it doesn’t distinguish adequately between these two. And the solutions or lack of solutions at stake are different. 

Negative sum dynamics

So negative sum dynamics, paradigm examples, classic one from the Future of Humanity Institute literature, it’s called Burning the Cosmic Commons. There’s some great weird papers about this. Basically, you imagine, okay, there’s a colonization of the universe race. Each party races to colonize as much of the universe as possible as fast as possible, wastefully burning tons of resources in the process, all parties would prefer to go slower but they don’t. So that’s a negative sum dynamic. 

Another maybe more mundane example, arms races. So you have both parties, they’re in a war, they’re in a conflict. They’re scared of conflict and so what do they do? They build up a huge arsenal of weapons to counter the arsenal built up by the other party. All parties would prefer to stick with smaller arsenals overall, spend the savings elsewhere, but Moloch. 

So what’s the solution? There’s a classic solution, it’s called coordination. You coordinate. Just do it. Parties foresee the costs of the negative sum dynamic in question and they coordinate to avoid those costs, for example, via suitably credible and stable commitments. Now easier said than done, but conceptually, I don’t know, there you have it. Do it. In particular, there’s often a hope that our capacity for this sort of coordination is going to increase markedly in the context of better technology. Now again, we can question this. We can question all sorts of ways in which this might go wrong, but I think there’s at least some hope. And the nice thing about this is, it’s like everyone wants this coordination to happen. It’s just like no one is sitting rooting for the negative sum dynamics. Everyone is like, can we at least not burn value for no reason? Can we at least transition to a scenario that’s Pareto-efficient? Okay, so this, I think, in some sense is an easier problem.

The strategy stealing assumption

Okay, what’s a harder problem? So the strategy-stealing assumption says roughly that agents with good values don’t have any inherent competitive disadvantage relative to other agents. Because agents with good values can, quote, “steal the strategies” that other agents pursue. This is a term from an essay by Paul Christiano, I think, from 2021. So the paradigm vibe here is: suppose you’re a utilitarian and you’re competing with a paperclip maximizer, right? Well, what do you do? You do all the same stuff. You just do the same stuff everywhere except at the end when you build paperclips or make utilitronium. So you fight, all this ruthless competition, or I don’t know, maybe you coordinate, it’s unclear. But the point is conceptually you and the paperclip maximizer are identical except with what you do with the resources you gather. And so the idea for the strategy-stealing assumption is that that means you’re at least on a par.

Locust-like value systems

Okay, so is that true, though? So here’s a salient way it could fail: locusts. So locust-like value systems just want to grow and consume resources as fast as possible. So they don’t want to use the resources. So the paper clipper wants to go out there get the resources, and then something, in particular, paperclips. This is a classic reason to want resources is for some other reason. Locusts, they just want to burn the resources as fast as possible. That’s what locusts are all about. Why? No, no, no, no, they just want to do it. So now we can talk about is it realistic to expect value systems like that? But this is just an example to get a sense of when strategy stealing could be wrong. So suppose you’re doing this colonization game with the locusts, here’s a concern you could have. It could be empirically, we don’t know. It could be that you get some advantage in grabbing resources from really wastefully burning them in ways that if you cared about using the resources for something else you wouldn’t want to do.

So the vibe here is something like you show up in a new galaxy or something like that and the main thing you do with that galaxy is burn it as fast as possible to get a little a 1% boost in the speed with which you can get to the next galaxy. And you don’t want to do that if you want it to turn the galaxy even into paper clips, the paper clippers, that’s crazy, but the locust are onto the next galaxy fast as possible right now. So importantly, you also can’t coordinate with the locust to prevent this. So in the paper clipper case, you and the paper clipper, you could get together and you’d be like, how about we don’t do that? And everyone’s like, yeah, for god’s sake, right? Locusts, they’re like, “No, sorry, this is what I like.” And so this is why I think the strategy-stealing assumption failures, it’s importantly distinct from the negative sum dynamics because coordination, the role of coordination is very different and same with the virtue…

I think a lot of people have this image that in the context of negative sum dynamics, I think this is right, there’s a virtue in not playing into it, not participating because you want to causally coordinate something, something to not go down the bad path, that doesn’t work with locust in the same sense that cooperating with a defect bot doesn’t work. You don’t want to cooperate with a defect bot, actually, ’cause you’re not going to make the defect bot cooperate. So if there’s a being that intrinsically values defection or if it’s a leopard or something, don’t cooperate with a leopard. You see what I mean? So there’s an important difference with whether or not there’s a coordination to move to another thing or some direct deeper conflict between you and another agent. So how bad this is does it depend on a bunch of gnarly dynamics to do with the physics of space colonization, which I personally don’t… I’m not an expert in.

Some people I know are very optimistic. Actually, it’s fine because like something, something, you get up to max speed and the galaxies, they’re defense dominant. I end up in these conversations where people are like, “I’m confident that space warfare is defense dominant.” People say this to me all the time and I’m like, “Wow.” Supposing space warfare is not defense dominant. You might also worry about arms races with locusts because they want to invade you and you can’t defend them with just a little bit of resources. You got to build up your arsenal. And in this case, again, the locusts, they love to build weapons. They just love weapons. So you can’t coordinate to prevent the arms race.

Other ways the strategy stealing assumption might fail

Now maybe that’s a special case, but there’s some other ways the strategy-stealing assumption can fail as well. 

So to list a few examples: deontological constraints. So maybe the paperclip maximizer is willing to lie, cheat, steal, et cetera, and good agents aren’t. Subject of much fiction, right? It’s like, oh, you’re there. It’s like you’re fighting the baddie. The baddie is willing to do the bad thing. Often in fiction, goodness somehow wins that one, we can hope, but that’s a concern. 

You could also have direct disvalue created via instrumental strategies or risk created via instrumental strategies that some agents are up for pursuing. So maybe the most competitive strategy involves a bunch of suffering or exploitation or tyranny or something like that, and you directly don’t want to do that. That’s directly bad on your values in a way that isn’t true of your competitor. That’s a scary risk as well. I think a lot of people are anchored on… So for example, people will sometimes be like, “Well, liberal economic arrangements or non-dictatorships or whatever are awesome so far for the past 100 years, a full 100 years of human history. We’ve had blah data about which sorts of political arrangements are most competitive.”And you might worry that that’s a little underpowered as a study of the space of possible political strategies, that there might, in fact, be other political strategies that are less congenial to your values; that are, in fact, more competitive. We don’t know. 

Unique vulnerabilities – so maybe good values, they specifically need to protect the frail biological humans and that’s a unique vulnerability. That’s one example. 

There could be some value systems, maybe they care less about value drift. They’re like, “Ah, whatever.” Just like, “My mind, children, whatever, I don’t care what they value.” And then you therefore save on the expenditures for preventing value drift. 

Persuasion asymmetries, maybe it’s easy to persuade people of some values than others. 

Maybe agents that are just less reflective or cautious have a competitive advantage even if they’re making mistakes by their own lights. So let’s say your strategy, it involves a bunch of suffering. Some agent is like, “I’m not even going to think about that.” If they thought about it, they’d be like, “Oh, actually, that’s messed up,” but they don’t. They don’t even think about it, they just go for it. So these are examples of ways in which the strategy-stealing assumption can fail. 

Addressing failures of the strategy stealing assumption

Okay. Okay, so how would you address that? So the classic answer to this is to prevent/constrain the relevant form of competition. So basically, if there’s some other value system that’d be more competitive in the relevant way and it would otherwise have arisen via the default trajectory, then you need to either make sure that it doesn’t arise or that it doesn’t grow enough to become problematically competitive. And so some examples here would be it’s like, sorry, no creating agents with locust-like values, you’re not allowed to.

Related, enforced limits/prohibitions on space colonization, reproduction growth, tech progress, buildup of military power. Okay, so this is scary stuff, and I hope you can see the potential for a lot of scary ways in which this top-down control or exertion of steering and calling… I don’t know, I have this picture of Cronus eating his children, it’s a myth. He’s like he got a prophecy that his children were going to overthrow him, so he ate them all except Zeus for some reason or something, and you eat them before they get too… It’s like you’re killing your children in the cradle so they don’t get too powerful. It’s scary stuff. But this is the sort of thing that is often countenanced as a solution to competitive asymmetries, inherent competitive asymmetries and values. And part of what I want to do here is stare at that and be like, “Wow, that’s really scary,” and is that necessary? How would we mitigate the risks that involves, but I want to look at it.

Now notably, some similarly scary stuff comes up in other contexts too, so maybe you were already concerned about just directly sadistic value systems. Maybe you didn’t want people to create suffering maximizers. You might’ve wanted that for other reasons. Same with the proliferation of access to dangerous technologies, recipes for ruin that could just destroy everything. There’s questions about whether that would give rise to need for similar sorts of control. 

So in some sense, this is just another version of a general problem that arises in the context of futurism of this kind, which is this: There’s these sort broad dialectic between threats from too much control, bad forms of control being exerted in the world and too much bottom-up proliferation of different things. This is very hazy dialectic, but I think this is the kind of… Competition concerns are concerns about too little control. There’s too much pluralism, too much multipolarity, too much decentralization and you need to get in there and mess with it. And so it’s used along with other things as an example of a case where you need this top-down-ness, but top-down is very scary too. And so you’re caught in between. 

Okay, questions about that basic dialectic? Okay. How are we doing on time? We’re doing okay. Okay. 

Is preventing/constraining competition in this way even possible?

So one question you could have is whether preventing/constraining competition in this way is even possible. So it requires notably intense degrees of foresight and control, including with respect to various types of internal conflict that could arise within a system, unforeseen disruptions/changes to the environment and what’s going on. Value drift, even if you’re hopefully reflecting and growing in other ways, you don’t want to just have an imposed stasis where nothing ever changes, there’s never any exploration, growth, learning, et cetera.

So you need to do that in a good way but not in the way that runs into the problem you’re trying to prevent with respect to bad competition. There’s selection processes taking place across all sorts of levels of abstraction. Do you need to shut all those down? What does that even mean? Instrumental value is becoming terminalized where you, you get… And to some extent, this is just like this. There’s a gradient pressure towards locust-ness. As you start to intrinsically value instrumental goods, that’s maybe a competitive advantage. And so this gets selected for in a bunch of ways. So the question of whether this is feasible at all, it’s related to the lock-in hypothesis. And related broadly, there’s this vibe in a lot of futurism, which is something, something, just assume you get to the max of tech. There’s tech and you get it all. You get tech maturity and then it peters off the amount of new tech you’re getting. And then in tech maturity you can do whatever you want. The world has become clay at your hands.

Is that true? Unclear? Right? Do we even want that to be true? It’s a little stuck. Anyway, but it still might be true. Anyway, so there’s a question of whether this is even possible. 

Is preventing/constraining competition in this way desirable?

As I said, though, there’s also a question of whether it’s desirable. And that question depends on a bunch of complicated specifics about the specific choice situation you’re facing, the specific kind of prevention or constraint on competition that you’re pushing for. There’s one question, how bad is the competitive world? I’m going to talk about that a little more in a second. And then there’s the expected results of attempting to prevent constrained competition in a relevant way. Possible results include, so one, often focused on, your favorite possible version of a prevented or constrained competitive landscape. How likely is that? It might not be what you get.

Classic problem with, “Ooh, let’s have more top-down control,” is like you love it when it’s the perfect way, but then it’s not the perfect way and then maybe it’s worse. People discover this in government. 

And then there’s also just worse versions of this type of thing, full totalitarian versions, also just stagnation, incompetence, stasis, homogeneity, the vices of guilds and monopolies where it’s like they’re worse, they’re squashing anyone else from getting into the thing, an end to growth, learning, discovery, exploration, experimentation. I want to prime an intuition about the nightmare of locking the world down, and this is something a lot of people are concerned about. The people who are really concerned about the world stagnating or having a homogenous world culture that’s maladaptive or something like that, very scared of this sort of thing. And I think there’s something real there.

What is a “locust” world actually like?

Okay. I want to also just spend a little bit of time on the locusts. Insofar as we’re especially scared of something locust-ness, I want to talk about. Okay, what is the locust world actually like? And what is the locust world? It’s a little unclear. I want to consider a world where there’s this constant effective selection pressure towards agents that intrinsically value whatever facilitates their long-term power, growth, resource consumption, et cetera. And so I think a classic hypothesis here is this ends up effectively valueless. As I said, you just grind away all the consciousness, joy, et cetera. You get a convergence on a homogenous hyper-optimized strategy. Sometimes imagine it’s like gray goo, it’s like this valueless replicator takes over. Another hypothesis, as I mentioned, is actively quite bad. It involves a bunch of suffering, et cetera. 

Might a locust world be less bleak than this?

Okay, but is that right? I think this might be underselling the locust world a little bit. Also, “locusts,” that’s tough. That’s a biasing label.

So one way to get happy about the locust world is just to value strength, power resource consumption for its own sake. Some people flirt with that, they talk like that. I don’t like that. So framed, I want to set that aside, but I think it’s still important that locust aren’t mindless gray goo. So to be a locust, you have to build spaceships, for example. It’s like there has to be cool tech. You have to be super intelligent. And in fact, you’re going to continue to try to become as super intelligent as possible if you’re a locust. So there’re still growing, learning, et cetera, et cetera. And then there’s this key question, which is, how much are things like consciousness, joy, beauty, creativity, knowledge, love, cooperation, et cetera, tied really closely to instrumental values like power growth, resource acquisition? Close enough that hyper-competition de-correlate them.

So the broad vibe and all this stuff, it’s like if two things are different and there’s optimization power like around being pointed at one, not the other, they’re going to be really different. They’re not going to come together. That’s the general concern that, I think, is just underlying a huge amount of post-futurism… Yeah, post-AGI futurism and including in this case. So it’s like you optimize for power. It doesn’t matter if they’re correlated. In the end, things aren’t correlated if one of them got optimized for. Now, but you could question that. So an example here is, just stick with consciousness. So I think there’s a really open question. We don’t understand consciousness, but there’s a open question, how brittle or robust is consciousness? And is consciousness, in particular, the sort of thing that you get basically out of being an intelligent self-reflective agent period?

In particular, an agent that has basic introspective capacity, basic self-reflection, basic modeling of self and world, If consciousness is really basic, if it’s something that any functional mind at a certain level of sophistication has to varying degrees, then you might think a locust world at least has a bunch of consciousness. Unclear. I will say all this, we don’t know, but I just want to be clear, there’s at least an open question. 

So maybe you could get into this vibe with pleasure too, like maybe pleasure caught up with agents getting what they want. Maybe there’s a lot of agents in a locust world, they’re doing things, they still have preferences. That one I’m like, seems a little tougher.

 You could think like beauty, some people are like beauty, it’s just functionality. It’s just elegant or the gazelle, “Isn’t that beautiful and isn’t that kind of just evolution.” I’m like, I don’t know, missiles, they’re not actually that beautiful, right? Factories are okay. They’re not that good, actually. Classically, people, they’re like form, function. They’re maybe not exactly the same, but some people hope for that stuff, yeah, that kind of stuff.

 Some people they’re like love cooperation and friendship. Maybe these are just cooperation. Maybe somehow that’s instrumentally incentivized and you want to be nice to people. This one I’m like maybe a little bit, but you don’t… People aren’t nice to the weak and then it’s like… Anyway, but some people talk this way. 

They’re like locust world is going to have all this good stuff, and in the limit you can go really far and you can be basically a… You can fully unify goodness and power and see evolutionary selection as the march of God in the world. The extreme version of this is Pierre de Chardin, Teilhard. There’s a book that expresses this perspective. It’s like Nick Land, except it’s good. 

And the book has some great gems by the way. There’s this bit where he’s like, “Could we die from an asteroid?” And then he’s like, “No, no.” Why not? It’s like, ’cause God clearly wants us to reach the Omega point. And it’s like the humans, it’s just… can’t you just see the narrative arc is towards reaching the… The Omega point is this, it’s like the singularity. It’s like the march of God, God becomes self-conscious, something, something, complexity, consciousness, love. It was all unified. Jesus looks down on history from the future. Anyway, actually, I didn’t like the book, but you can read it for an expression of this vibe. 

Another question is whether the locust world is repetitive. So I think a decent amount of the work when people are like, “Oh, paper clips.” What’s bad about paper clips? It’s so samey. It’s like paperclip, paperclip, it’s tiled. We don’t like tiling, right? Same problem for the utilitarians and stuff. Potential advantage of the locust world is maybe it’s like, it’s not like you just get the most competitive thing, you just have this ongoing crazy teaming diversity, like ecosystems.

Stuff is just changing all the time. People are adapting different strategies, etc. And maybe it’s interesting and pluralistic or something. Okay, so I’m saying all this just so I don’t want to undersell, or I think I want us to look at this and know exactly how bad or good this is. I think it could be really bad to be clear. And I think it’s sufficiently plausible that a world of totally unconstrained hyper competitions is just really, really bad. And so it’s sufficiently plausible, in my opinion, that we should not just be like, “Go for it.” I do think it’s like, I want us to be staring at all aspects of the trade-offs here, including this. 

Current overall take

Okay. So my current overall take: can goodness compete is a problem even beyond the economic value of biological human labor and technical AI alignment. I’m optimistic about avoiding many negative sum dynamics via coordination. It looks to me like the strategy ceiling assumption may well be false in important ways. For example, re-locust and coordination doesn’t solve that.

Locust-like futures might be okay. Or also, by the way, the locust, they still have war. It’s still bad. I think that the vibe used to be like, how do you feel about nature right now? And some people are not excited about nature right now. Some people are like: it’s pluses and minuses with nature right now, and so maybe the vibe is like that’s what a locust-like future would be like. I think the best futures at least would require a good deal of preventing constraining competition, at least re-locust like value systems, and this despite many risks that this entails. So we should look the scary trade-offs this implies in the face. 

Poem: “Witchgrass”

Okay, so that’s the talk. I think I actually have enough time to read this poem that I wasn’t sure I’d have time for, but actually, I’m going to read. I just want to end with… Okay, I’m just going to read the poem. It’s called Witchgrass, and witchgrass is a weed. It’s a poem by Louise Glück. I’m reading this partly ’cause it’s from the perspective of the weed and I want to give weeds a voice.

Witchgrass
Louise Glück

Something
comes into the world unwelcome
calling disorder, disorder—

If you hate me so much
don’t bother to give me
a name: do you need
one more slur
in your language, another
way to blame
one tribe for everything—

as we both know,
if you worship
one god, you only need
One enemy—

I’m not the enemy.
Only a ruse to ignore
what you see happening
right here in this bed,
a little paradigm
of failure. One of your precious flowers
dies here almost every day
and you can’t rest until
you attack the cause, meaning
whatever is left, whatever
happens to be sturdier
than your personal passion—

It was not meant
to last forever in the real world.
But why admit that, when you can go on
doing what you always do,
mourning and laying blame,
always the two together.

I don’t need your praise
to survive. I was here first,
before you were here, before
you ever planted a garden.
And I’ll be here when only the sun and moon
are left, and the sea, and the wide field.

I will constitute the field.

Okay, thank you.

Q&A

Questioner 1:

Thank you for the talk. I’m really glad that I thought about this through you. I’m really glad you thought about this. But is it actually a problem if we assume coordination away, can’t we just coordinate to allow value diversity so long as there aren’t any locusts? And then for acausal trade, we can’t trade with locust anyways, so why bother? And then if there are alien locusts, then we can’t win unless we have more resources from the start. So we should just calculate the likelihood that there are locusts and then accept the likelihood of defeat and then just have our values, whatever they are, all the people that value something that’s not just terminal growth. What do you think of that?

Joe Carlsmith:

Yeah, so I think you could think, yeah, this is a problem, but it’s only a problem with locusts, so just no locusts. No locusts, no sadists, everyone else, fine. Sadists, that’s a separate thing. Sadists are not necessarily competitive, it’s just that sadists are, it is tough what they… Yeah, so you could think that. I think you still need to grapple with all of these guys. Now you could be like, “Ah, let’s go through them one by one, I’m a consequentialist,” whatever. Maybe disvalue risk. It’s like, “Ah, it’s not optimally disvaluable.” I don’t know. You could go through and do..  I think locusts are the most worrying. And I think to the extent you’d be less worried about a wide swath of value systems and then the better this problem is. But you do want to grapple with all of the possible ways the strategy-stealing assumption might fail.

I will also flag, in the real world, this stuff comes up all the time without these really clean decompositions between your values and your optimization power. I’ve been trying to take it to the limit where I’m like, “Okay, there’s…” So for example, I haven’t talked very much about ways in which just making mistakes or being wrong. Like this last one where it’s not a problem of… your values is a problem about rationality or other forms of incaution, I think, are also on the table here anyway. But yes, if you have gone through all of the possible failures of the strategy-stealing assumption, you’re like, “Ah, it’s a narrow range,” just prevent that particular strand of value from becoming sufficiently powerful, then it helps.

Questioner 2: 

Do you think it’s obvious that locusts have to be highly intelligent to be doing the things they want to do? Or at least there has to be lots of high intelligence in a locust world? That seems surprising to me that we think that has to be the case.

Joe Carlsmith:

You got to build spaceships. If you want to at least to consume all the resources, I think there’s a question of how… We do get killed by viruses and stuff and you could have blight. You could have gray goo. That is possible. I think the version that is most concerning is where you have the full power of super intelligence directed towards the locust-like values because then you don’t get any advantage from that, from your having it.

Questioner 2: 

It plausibly doesn’t need to be much intelligence. You could have a superintelligence that just starts a blight every time it hits the galaxy. It might be the most efficient way…

Joe Carlsmith:

Well, but I don’t know, does it have to fight other beings and are they smart? Does it have to solve problems, new problems that come along? How fast does it solve those? With what resource efficiency? So eventually you do need to be smart if there’s a bunch of competition or at least I think that’s plausible. It does depend on the competitive landscape. So you could have a blight or a locust-like thing that is a local maxima in the competitive landscape.

I think the notion of a local maxima is a little complicated ’cause I don’t think the notion of competitiveness is a kind of coherent hierarchy. I don’t think there’s a single metric of competitiveness, which I think is important. In fact, the competitive landscape has loops and A can beat B, and B can beat C and C can beat A, and it can matter. Shark beats gorilla in water, but gorilla beats shark on land. So competition is not simple, I think. But in principle, there are possible locust that are not super intelligent, yeah, potentially. 

Questioner 3:

So it seems like what we really need is some set of tools for thinking about what a world full of kind-of locusts might be. You’ve walked us through a world that in which full-on locusts emerge, but it’s going to be a spectrum and we’re probably going to be not at the complete extreme end of the spectrum. Do you have any thoughts?

Joe Carlsmith:

What are you thinking is the key difference that the median-ness makes there?

Questioner 3:

Yeah, just we don’t currently have any locusts, and so at some point locusts that develop will not be entirely fully locusts. They’ll be like half-locusts. And that’s going to be when our intellectual framework for thinking about how to deal with this kind of thing actually matters. And so if we’re doing philosophy and we’re trying to think about this, we need a framework for dealing with that world, that’s a full locust world.

Joe Carlsmith:

Yeah, I think as a first pass, my framework would be there’s still a gradient towards locusts. Now, I’m feeling bad about using the term locust so much. I feel like it’s like, I don’t know, you’re calling these beings… I don’t know, power maximizer, is that better? Or something that is not implying pest or something I think would be good. I don’t know. So I’m regretting locusts here. 

I guess I also will say, so Nick Land’s worldview is that in some sense, locust-ness, sorry, power maximization, intrinsic valuing of instrumental goods is the telos of capitalism and that that structure is already imminent in the sociopolitical landscape.

So I don’t think this is necessarily just something that is far off or a separate concern to some extent. You might think that there’s forces that are in some sense at some level of abstraction well understood as power maximizing intrinsically that are already at play in the world. Yeah. But broadly speaking, the thing I would say is the same points about preventing and constraining the relevant form of competition will apply, albeit to the gradient towards a more intense power maximizing world. 

Questioner 4:

So it seems like the way to preserve goodness in the future is to some extent limit competition or exercise a large degree of control over all agents. And this seems like it would be effective except for the fact that eventually you’ll encounter alien civilizations, some of which may be locusts or highly-competitive power seeking. And if you have limited the competitiveness and power seeking of your own civilization to preserve goodness, then you may just be out-competed by the alien civilizations. So are we just screwed? Do we need to just hope there are no alien civilizations close enough for this to matter? What do you think?

Joe Carlsmith:

Yeah, I guess on this framework, if you were going to encounter full-blown, fully-grown power-maximizing civilizations as aliens, then yeah, you would lose or otherwise get caught in arms races, wasteful dynamics. To some extent, that’s just the implication of having failed to prevent the relevant form of competition from taking place. 

I think people are awful confident that we’re going to run into aliens. My vibe, I feel like people, they’re like, “Ah, I read the Grabby Aliens paper. It’s something, something.” And it’s like, “Yeah, the Grabby Aliens paper has all this stuff. It’s all…” Anyway, to me, it feels like the most natural explanation for the Fermi paradox is just that life is really, really rare and the world is really, really big. And so I feel very unsurprised if there’s just no one in the light cone. That’s my current guess mostly ’cause I feel like the reasoning in Grabby Aliens is very fancy and that reasoning is very unfancy, but I could be wrong. Anyway, but yeah. Okay. Yeah, I guess people on the edge are getting a competitive disadvantage or advantage here.

Questioner 5:

Oh, cool. Thanks. Yeah. So I guess there are two different claims that one could be making about the flaws in the strategy-stealing assumption. One is that the strategy-stealing assumption generally fails in favor of bad values or something. And the second one is just that the strategy-stealing assumption is not true, but it could fall on either side. And I think there are certain kind of arguments here that seem to be of that form. You can expect that destroying certain forms of things might also be against the bad agent’s values or something like that. So in general, do you have a sense of, if you just get rid of the arguments that could fall on either side, how many arguments actually still hold for the strategy-stealing assumption?

Joe Carlsmith:

Sorry, I feel like there was one point there which is like, hey, maybe goodness has an advantage over some value systems, too, fail on either side. Is that what you meant by fail on either side?

Questioner 5: 

Oh, no. I guess what I meant is for some of these specific things, it’s really hard to say whether or not a good agent or a bad agent will be preferentially advantaged by the failure of the strategy-stealing assumption. So it’s hard to see the specific words from back here. But I think one of these is like, oh, if you need to do things like destroy resources or something or lock in certain things in order to pursue your own values, then this is bad. And you could expect that maybe certain agents with bad value systems or also really disinclined to do certain things and this could also harm them, but we just have no idea. It depends on the specific value itself.

Joe Carlsmith:

Yeah. So a way in which talking about the strategy-stealing assumption failing is a little cheat-sy, is I’m allowed to pick the value system relative to which good values potentially are outcompeted. And so I get to choose from across the space of all possible value systems. So you could have a bad value system that also has deontological constraints or something, and then the deontological constraints thing would not be an asymmetry. Or you could have a value system that also needs to protect weak, vulnerable humans or something, but it’s different from yours in some other respects. So I think yes, the strategy-stealing assumption being true across all possible value systems is a really quite intense and surprising claim in some sense that there’s no value system that gets an advantage over you.

And to some extent, that the locusts are this weird edge case of, I’m picking the thing that you hope to coordinate not to do it, I’m saying they’re terminally valuing it. That is a little cheat-sy. And I think this is a reason for hope in the sense that you could think whatever search over possible competitors is taking place, it’s not going to cover the specific type of value you’re concerned about. Yeah. I think for some of these, though, the deontological constraints one for example, this comes up a lot. And some of the questions about which political systems are most competitive, I think if you’re really grokking goodness’s pursuit of its values is going to have to be really instrumentally ruthless in a bunch of ways. I do think that can pump an intuition about worries, about strategies stealing, even irrespective of the kind of specificities of the values you’re competing against. Yeah.

Questioner 6:

So in the slide about the locust world or the unconstrained competition world, you said that probably they won’t have love or various other… beauty or things like that. And I think that makes sense because love the way we experience it is because of human social ecology. But maybe they will also have other things that are valuable from some impartial point of view, even though they’re not the things we would list as what makes life valuable. Does that make it seem more optimistic to you?

Joe Carlsmith:

Maybe. Maybe. Yeah, I think it’s a little bit hard to think about the space of possible values that you haven’t discovered that you would value and then to reason about whether they’re likely to crop up by accident as a result of unconstrained competition. You might use some prior of specific things don’t happen. So if it’s a specific thing, it won’t happen by default. I don’t know. It’s a classic prior. Anyway. Well, yeah, in particular to the extent, yeah. The case for that would come centrally from some connection between power and goodness or between power and these new values. A thing you could think is maybe your construal of your values should ultimately route harder through evolution as a structuring force for your concepts. Maybe if you truly discover what you value, the process of refining your concepts would move them more in the direction of connection with power, in which case you could get into this vibe. But that’s a very particular way of precisifying your values.

Questioner 7:

Yeah. Could you explain the strategy-stealing assumption a bit more? Because it seems obviously crazy to me, and I feel like it can easily, I don’t know, you can have one value system, which just is really, really hard to achieve like finding collision in some large hash value. And this value system’s strategies are just going to look wildly different from any regular strategies then. It’s like-

Joe Carlsmith:

The strategy-stealing assumption, I guess, is not for any value system that value system is equally competitive with every other value… Or that does sound very false. So I think you got to start with your best guess about your own values and then try to see whether those values have particular disadvantages relative to others. And I think it’s no surprise, for example, that the strategy-stealing assumption comes from a broadly consequentialist tradition, which is imagining competing with paper clippers. And it’s like, oh, come on, we’re all doing the same thing, just different uses of your computers at the end of the day or something. But if your values are different from that, then it might be more obvious or less obviously true. 

Questioner 8: 

Thank you. I was expecting you to make another step in your argument, and maybe I just missed it or I’m being silly or something. But I guess there’s two stories about competition you could tell. One is that it erodes away goodness and so it seems bad. And there’s that. And then I guess there’s the other one, which is that it rewards goodness. And I think of markets as the paradigmatic example of people are in competition, but they’re in competition for people’s preferences and stuff. And I guess my instinct would be that AI follows more like the market case because AI will catch on in a market setting and people will want to buy these things and so on. And so I guess I’m wondering why your instinct is to talk much more about this version of competition. And I know there’s this, but it seems like your whole… the way you are thinking about competition is with this idea that it’s going to erode away goodness rather than that being kind of like the T-Loss or where it’s going.

Joe Carlsmith:

Yeah. So I tried to obliquely at least reference the benefits of competition in all of my, be scared of shutting down competition vibes. So what do markets really do? As a first pass, they respond to demand weighted by the power and resources of the demanding entities. So markets meet people’s preferences to the extent that those people have money or whatever. And so a market, and there are all these unconscious agents in there. One of them is maximizing staples and another is maximizing paper clips. They’re doing great trades. New factories are being produced for… but it’s not necessarily good, laim, or at least it’s good qua the set of preferences at stake in that world.

But it might not be good relative to if you’re into, for example, conscious entities or whatever. And so maybe you aren’t, and maybe your conception of goodness is, I just want whatever landscape of agents exist to be getting their preferences satisfied. And I think if that’s your particular ethic, including sadistic agents maybe, or who knows, then in some sense, certain sorts of competition might look good. But I think if you have something less agnostic about the sorts of preferences that are being satisfied in the world, then I think you have to wonder whether those preferences will have the sort of power required to pay for the stuff on the market.

Questioner 9:

I was wondering what your ideas are for what you’re going to ask yourself next in this line of work. And I asked that in part because the conclusion sounds a lot what liberal democracy has done amidst capitalism forming, targeted constraints on competition. So that’s your current take. What are the open questions for you now?

Joe Carlsmith:

The thing that I’m actually doing right now is I’m being like, I’m trying… So yeah, I think this is the first pass take is that I’m not departing deeply from that here. I’m trying to structure our thinking about this and be like, yeah, you need the right mix of bottom up and top down. You just need that right mix. Oh. But I think sometimes the true take is kind of boring. The thing I’m actually doing that feels like somewhat novel to me is trying to grok a little more some of this stuff about like, wait, can we be serious about what a locust world would look like and make sure we’re understanding…

I do think there’s something about the growth of knowledge and consciousness for its own right, or sorry, knowledge in its own right or intelligence in its own right that is nearby to a notion of locusts. I think this is what Nick land is really into.

He’s like the will to think. An intelligence will go off and think. It must think. And I’m actually interested in that. I’m like, “Huh.” I don’t know, there’s some potential different ontology to do with ways in which… I’m interested in whether orthogonality is really right. I think a lot of this is rife with these really deep structural orthogonalities, and I’m trying to make sure that that’s true. Yeah. So another example is this is all rife with this, there’s a utility function and there’s the optimization and these are separable, right? Is that right? It’s not really right. Well, maybe it’s not right, but you can’t just be like, “No, they’re not separable.” You need to have an actual, I don’t know, alternative ontology that actually works. Anyway, so that’s where my own thinking is at the moment.

Host:

All right. Thank you, Joe Carlsmith, for coming. Thank you, everyone else. This is the official end of the event, but you’re still here. Joe Carlsmith is still here for some time. And so yeah, have fun. Bye.