AI

My work at Open Philanthropy focuses specifically on making sure that the development of advanced AI systems does not lead to existential catastrophe. I've written a long report ("Is Power-Seeking AI an Existential Risk") on what I see as the most important risk here -- namely, that misaligned AI systems end up disempowering humanity. There's also a video summary of that report available here; slides here; and a set of reviews here. Before that, I wrote a report on the computational capacity of the human brain (blog post summary here), as part of a broader investigation at Open Philanthropy into when advanced AI systems might be developed (see Holden Karnofsky's "Most Important Century" series for a summary of that broader investigation).

05.22.2025

Video and transcript of talk on AI welfare

An overview of my take on AI welfare as of May 2025, from a talk I gave at Anthropic.

05.21.2025

On the moral status of AIs / Part 1:

The stakes of AI moral status

On seeing and not seeing souls.

04.30.2025

Video and transcript of talk on automating alignment research

From a talk at Anthropic in April 2025.

04.30.2025

How do we solve the alignment problem? / Part 6:

Can we safely automate alignment research?

It’s really important; we have a real shot; there are a lot of ways we can fail.

03.14.2025

How do we solve the alignment problem? / Part 5:

AI for AI safety

We should try extremely hard to use AI labor to help address the alignment problem.

03.11.2025

How do we solve the alignment problem? / Part 4:

Paths and waystations in AI safety

On the structure of the path to safe superintelligence, and some possible milestones along the way.

02.19.2025

How do we solve the alignment problem? / Part 3:

When should we worry about AI power-seeking?

Examining the conditions for rogue AI behavior.

02.13.2025

How do we solve the alignment problem? / Part 1:

How do we solve the alignment problem?

Introduction to an essay series about paths to safe, useful superintelligence.

02.13.2025

How do we solve the alignment problem? / Part 2:

What is it to solve the alignment problem?

Also: to avoid it? Handle it? Solve it forever? Solve it completely?

12.18.2024

Takes on “Alignment Faking in Large Language Models”

What can we learn from recent empirical demonstrations of scheming in frontier models?

10.08.2024

Video and transcript of presentation on Otherness and control in the age of AGI

An attempt to distill down the whole “Otherness and control” series into a single talk.

09.30.2024

(Part 2, AI takeover) Extended audio/transcript from my conversation with Dwarkesh Patel

Extra content includes: AI collusion; the nature of intelligence; concrete takeover scenarios; flawed training signals; tribalism and mistake theory; more on what good outcomes look like.

09.30.2024

(Part 1, Otherness) Extended audio/transcript from my conversation with Dwarkesh Patel

Extra content includes: regretting alignment; predictable updating; dealing with potentially game-changing uncertainties; intersections between meditation and AI alignment; moral patienthood without consciousness; p(God).

06.18.2024

Otherness and control in the age of AGI / Part 11:

Loving a world you don’t trust

Garden, campfire, healing water.

03.25.2024

Otherness and control in the age of AGI / Part 10:

On attunement

Examining a certain kind of meaning-laden receptivity to the world.

03.22.2024

Video and transcript of presentation on Scheming AIs

An intro to my work on scheming/”deceptive alignment.”

03.21.2024

Otherness and control in the age of AGI / Part 9:

On green

Examining a philosophical vibe that I think contrasts in interesting ways with “deep atheism.”

01.18.2024

Otherness and control in the age of AGI / Part 8:

On the abolition of man

What does it take to avoid tyranny towards the future?

01.16.2024

Otherness and control in the age of AGI / Part 7:

Being nicer than Clippy

Let’s be the sort of species that aliens wouldn’t fear the way we fear paperclippers.

01.11.2024

Otherness and control in the age of AGI / Part 6:

An even deeper atheism

Who isn’t a paperclipper?

01.09.2024

Otherness and control in the age of AGI / Part 5:

Does AI risk “other” the AIs?

Examining Robin Hanson’s critique of the AI risk discourse.

01.08.2024

Otherness and control in the age of AGI / Part 4:

When “yang” goes wrong

On the connection between deep atheism and seeking control.

01.04.2024

Otherness and control in the age of AGI / Part 3:

Deep atheism and AI risk

On a certain kind of fundamental mistrust towards Nature.

01.02.2024

Otherness and control in the age of AGI / Part 2:

Gentleness and the artificial Other

AIs as fellow creatures. And on getting eaten.

01.02.2024

Otherness and control in the age of AGI / Part 1:

Otherness and control in the age of AGI

Introduction and summary for a series of essays about how agents with different values should relate to one another, and about the ethics of seeking and sharing power.

11.15.2023

New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”

My report examining the probability of a behavior often called “deceptive alignment.”

10.18.2023

Superforecasting the premises in “Is power-seeking AI an existential risk?”

Superforecasters weigh in on the argument for AI risk given in my report on the topic.

05.08.2023

Predictable updating about AI risk

How worried about AI risk will we be when we can see advanced machine intelligence up close? We should worry accordingly now.

03.22.2023

Existential Risk from Power-Seeking AI (shorter version)

Building a second advanced species is playing with fire.

09.06.2022

Is Power-Seeking AI an Existential Risk?

Report for Open Philanthropy examining what I see as the core argument for concern about existential risk from misaligned artificial intelligence.

08.21.2022

Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Video and transcript of a presentation I gave on existential risk from power-seeking AI, summarizing my report on the topic.

09.15.2020

How much computational power does it take to match the human brain?

Report for Open Philanthropy on the computational power sufficient to match the human brain’s task-performance. I examine four different methods of generating estimates in this respect.