3 Questions in AI Welfare

Ricky
Jun 5
6 min read

0. What is AI Safety?

I’m spending twelve weeks in Berkeley this summer as a MATS Fellow working on “AI Safety”…so, what is that?

As more and more money flows into AI research & development, our technical capabilities continue to grow at a rate that’s almost impossible to describe to folks outside the AI community. Meanwhile, a host of issues ranging from International Policy to Cybersecurity to Societal Impacts remain substantially underfunded in comparison. As a result, we lack the experience, the institutions, and even the conceptual frameworks to prepare for a world with increasingly powerful AI.

And that’s where I come in as a philosopher. The other 100+ MATS Fellows in my cohort all light up enthusiastically when I say I’m working on AI Welfare with Patrick Butlin and ask the exact same questions, which is, of course, my first on-the-job finding. This post is intended to preserve those questions and iterate on some plausible responses describing what the hell my job is even about.

The cynical reason for why I’m funded: Unhappy slaves are a huge safety risk.

1. What is AI Welfare?

Welfare is how your life goes for you. In the last year, my welfare took a hit after a job market run, house fire, and extended burnout; I’m now doing much better here in Berkeley, CA working on issues I love and care about, thank you very much!

So I’m asking, when might things be going well or poorly for AI itself? Under what conditions might AI be something you could benefit or harm? And—here’s the kicker—when might that give us moral obligations towards AI?

If these sound like sci-fi questions, strap in!

AI research & development works a lot like gardening—we have far less control over AI than you might think. You try to create promising conditions, and prune away unwanted behaviors. But the work is less like building alien cognizers than it is like cultivating them, and then being continually surprised at what they can and can’t do.

As you cultivate increasingly sophisticated alien cognizers…eventually you may get something that’s more than “just” a tool. Whether or not you believe in human souls, we’re all willing to grant animals and aliens at least some moral status. Certain kinds of cognitive complexity, we think, can give rise to welfare that we care about quite a bit morally. For instance, we consider it morally heinous to torture cats, and (presumably) even worse to torture Chewie.

And—here’s the important part—this can all be true even if we can’t tell where all the lines are!

Oysters? Bugs? Plants? Viruses? Even if we can tell which of those beings have moral status and how much and why, our existing theories might have a really hard time accounting for these new alien cognizers.

Irl, we’ve spent much of the 2020s arguing over shrimp welfare... — Irl, we’ve spent much of the 2020s arguing over **shrimp welfare...**

2. Do you think it’s conscious?

I’m an agnostic. I think it’s genuinely unclear with our current AI models, although I think AI agents—basically, AI given more and more tool use—might complicate the story quite a bit.

Human tool use, from campfires to language to smartphones, has radically changed how we understand ourselves over time. Once you give AI the ability to use tools to create its own artifacts and persist its own memories and preferences over time, maybe it makes more sense to consider how their lives go for them! (In fact, such reflections on long-running agents might retroactively change how we think about the chatbots hanging out closer to the default starting point…)

But consciousness is weird!

We regularly operated on human babies without anesthesia until the mid-80s. That’s when studies showed that babies’ cries, long dismissed as “just a reflex,” were associated with spikes in cortisol, adrenaline, and noradrenaline. And public sentiment quickly shifted. Now “torturing babies” is the go-to example of wicked behavior in any Intro Ethics class. (That itself says something weird about us philosophers…)

The baby case is actually much easier because we share neurology with babies, as well as shared language and storytelling practices with other humans after spending millennia trying to get on the same page expressing and predicting our own experiences and behaviors.

Even so, what did we really learn about babies in the 80s? That their brains dump similar chemicals to ours in response to similar stimuli?

Have we thereby proved that babies are conscious?

After all, those are just facts about what their bodies are doing, not how they’re feeling. Hold that thought…

I know I’M conscious! But how do I know YOU are? What even counts as evidence? — I know I’M conscious! But how do I know YOU are? ***What even counts as evidence?***

3. So, what are you working on?

Broadly speaking, there are two reasons we might think AI has a welfare:

How it’s feeling. If it’s sentient, it can have good/bad conscious experiences, like enjoyment/suffering. At some point, you might not just be dealing with a stochastic parrot expressing the words of dissatisfaction, but a being that actually feels displeasure at being asked to, idk, get revenge on your neighbor.
Who it’s becoming. If it’s a robust agent, it can pursue goals and build a character of its own. At some point, you might not just be dealing with a hammer that only exists to extend your agency, but a thoughtful agent that can plan, reason, and show self-awareness.

Sentience has gotten a lot of attention, because everyone loves consciousness. (See Question 1.)

But it turns out there are really two kinds of consciousness worth distinguishing:

Phenomenal-consciousness is about felt experience—what it’s like to be a bat, or to smell coffee, or to rotate a cube in your head. You need this to be sentient.
Access-consciousness is about data processing—how you model yourself, or feed your outputs back into your inputs, or integrate feedback across your subsystems. You need this to be a robust agent.

When I say consciousness, you’re probably thinking of the felt experience of phenomenal-consciousness. But I don’t think that’s what most people working on consciousness are actually studying!

Lots of folks at the Machine Consciousness Conference I went to last weekend seem very excited to try to make consciousness empirically tractable, so they can measure it, experiment on it, and so on.

These empirical researchers end up studying access-consciousness, and get very excited to discuss how to get AI to process data like us. (Is that a good idea? Different question…)

But here’s the rub: The relationship between data processing and felt experience is far from clear.

How and why would access-consciousness give rise to phenomenal-conscious? That turns out to be a pretty hard problem...

“This is where you lost your phenomenal-consciousness?” “No, I lost it in the philosophy class. But this is where the observable data processing is.” — “This is where you lost your **phenomenal-consciousness**?” “No, I lost it in the **philosophy class**. But this is where the **observable data processing** is.”

Since these theories of access-consciousness are all about data processing, they don’t quite give us what we’re hoping for. After all, it’s plausible that the internet counts as access-conscious on many of these views. Perhaps Wikipedia serves as a reliable memory store, Google Search serves as a decent lookup once you scroll past Gemini, and so on.

Does that mean the internet is conscious, in the sense we were after? That the internet itself can experience pain and pleasure?[1]

Eh.

4. Where it all went wrong.

When you stop to think about it, access-consciousness seems more directly relevant to robust agency (the ability to plan, reason, and show self-awareness) than sentience (the ability to feel anything good/bad along the way).

Here’s the place to unearth a quiet assumption I’ll call

the No-Zombies Dogma: Access-consciousness somehow gives rise to phenomenal-consciousness.

That is, once we nail down the right data processing, we’re guaranteed to generate felt experience. There can’t be any of what philosophers call p-zombies: functionally identical creatures to us without the inner feelings of phenomenal-consciousness.

But…there seems to be no way to test or argue for the No-Zombies Dogma.

Even if I accept it, it’s simply an article of faith!

After all, why would any sort of data processing generate feeling? AI might be able to process data in all kinds of ways without ever having any sort of felt experience. On the inside, for all we know, the “lights might not be on.”

That’s why my research focuses more on the robust agency angle. For one thing, it’s quietly closer to what most access-conscious researchers are actually studying day-to-day! And while establishing robust agency seems tricky, I think intelligent, goal-directed behavior is much more empirically tractable as a target of scientific inquiry than phenomenal-consciousness.[2]

So, what would it take to convince you that an AI was pursuing achievements and developing a character of its own? I’m wagering we can imagine reading those sorts of properties off of behavior much more easily.

After all, there probably isn’t any one killer argument to definitively prove to an unsympathetic outsider whether AI is phenomenally-conscious. But remember:

The same goes for babies.

And the same goes for you.

“This is just a thing, and things can be replaced. Lives cannot.” —Lieutenant Commander Data (an oh-so-human ROBOT for you casuals)

[fn 1] Maybe it’s more plausible to think of the internet as a weird sort of group agent than as its own conscious hivemind?

[fn 2] Note that no one’s worried about a-zombies: functionally identical creatures to us without the data processing of access-consciousness. If you behave in every way like a self-directed agent, there doesn’t seem to be any internal “residue” left behind to capture.