Back to homepage

All Writing

Most recent Archive

Video and transcript of talk on human-like-ness in AI safety

From a talk I gave at Constellation in December 2025.

Continue reading

How do we solve the alignment problem? / Part 9:

How human-like do safe AI motivations need to be?

AIs with alien motivations can still follow instructions safely on the inputs that matter.

Continue reading

Leaving Open Philanthropy, going to Anthropic

On a career move, and on AI-safety-focused people working at AI companies.

Continue reading

How do we solve the alignment problem? / Part 8:

Controlling the options AIs can pursue

On blocking paths to power, and on making deals.

Continue reading

Video and transcript of talk on giving AIs safe motivations

From a talk at UT Austin in September 2025.

Continue reading

How do we solve the alignment problem? / Part 7:

Giving AIs safe motivations

A four-part picture.

Continue reading

2025

Video and transcript of talk on “Can goodness compete?”

Video and transcript of talk on AI welfare

The stakes of AI moral status: Part 1

Video and transcript of talk on automating alignment research

Can we safely automate alignment research?: Part 6

AI for AI safety: Part 5

Paths and waystations in AI safety: Part 4

When should we worry about AI power-seeking?: Part 3

How do we solve the alignment problem?: Part 1

What is it to solve the alignment problem?: Part 2

Fake thinking and real thinking

2024

Takes on “Alignment Faking in Large Language Models”

Video and transcript of presentation on Otherness and control in the age of AGI

(Part 2, AI takeover) Extended audio/transcript from my conversation with Dwarkesh Patel

(Part 1, Otherness) Extended audio/transcript from my conversation with Dwarkesh Patel

Loving a world you don’t trust: Part 11

On attunement: Part 10

Video and transcript of presentation on Scheming AIs

On green: Part 9

On the abolition of man: Part 8

Being nicer than Clippy: Part 7

An even deeper atheism: Part 6

Does AI risk “other” the AIs?: Part 5

When “yang” goes wrong: Part 4

Deep atheism and AI risk: Part 3

Gentleness and the artificial Other: Part 2

Otherness and control in the age of AGI: Part 1

2023

New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Superforecasting the premises in “Is power-seeking AI an existential risk?”

In memory of Louise Glück

Predictable updating about AI risk

Existential Risk from Power-Seeking AI (shorter version)

A Stranger Priority? Topics at the Outer Reaches of Effective Altruism

Seeing more whole: Part 2

Why should ethical anti-realists do ethics?: Part 1

2022

Against meta-ethical hedonism

Against the normative realist’s wager

Is Power-Seeking AI an Existential Risk?

Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Dutch books, Cox, and Complete Class: Part 4

VNM, separability, and more: Part 3

Why it can be OK to predictably lose: Part 2

Skyscrapers and madmen: Part 1

Simulation arguments

On infinite ethics

The ignorance of normative realism bot

Morality and constrained maximization, part 2: Part 2

2021

Morality and constrained maximization, part 1: Part 1

Anthropics and the Universal Distribution

On the Universal Distribution

In defense of the presumptuous philosopher: Part 4

An aside on betting in anthropics: Part 3

Telekinesis, reference classes, and other scandals: Part 2

Learning from the fact that you exist: Part 1

Can you control the past?

In search of benevolence (or: what should you get Clippy for Christmas?)

On the limits of idealized values

Problems of evil

The innocent gene

The importance of how you weigh it

On future people, looking back at 21st century longtermism

Against neutrality about creating happy lives

Care and demandingness

Subjectivism and moral authority

Two types of deference

Contact with reality

Killing the ants

Believing in things you cannot see

Actually possible: thoughts on Utopia

Shouldn’t it matter to the victim?

The despair of normative realism bot

2020

Alienation and meta-ethics (or: is it possible you should maximize helium?)

Wholehearted choices and “morality as taxes”

Thoughts on being mortal

Grokking illusionism

The impact merge

Thoughts on personal identity

How core is confusion about consciousness?

To light a candle

The gestures of trees

Mistaking the plot

How much computational power does it take to match the human brain?