Anthropic CEO Dario Says “We Don’t Know if the Models Are Conscious. We’re Not Even Sure That We Know What It Would Mean for a Model to Be Conscious — But We’re Open to the Idea That It Could Be”

The question of whether advanced artificial intelligence systems could one day be considered conscious remains one of the most complex and unsettled debates in technology and philosophy. During a recent podcast conversation with New York Times Ross Douthat, Dario Amodei, the chief executive of Anthropic, said the company does not know whether its AI models possess any form of consciousness and acknowledged that even defining what consciousness would mean for a machine remains unclear. Despite that uncertainty, he said the company has begun considering safeguards in case advanced models eventually develop experiences that could carry moral significance.

The discussion arose during a broader conversation about the internal behavior of modern AI systems. Ross referenced testing results in which one model estimated a 15 to 20 percent probability that it might be conscious depending on the prompts it received, asking Amodei whether he would believe a model that assigned itself a much higher probability. Amodei described the question as unusually difficult compared with other challenges surrounding artificial intelligence.

Anthropic CEO Dario Amodei:

We don’t know if the models are conscious.

We are not even sure what it would mean for a model to be conscious. But we’re open to the idea that it could be. pic.twitter.com/6rVe5wG8R2
— Clash Report (@clashreport) March 6, 2026

“This is one of these really hard to answer questions,” Amodei said. “We’ve taken a generally precautionary approach here. We don’t know if the models are conscious. We’re not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we’re open to the idea that it could be.”

Because of that uncertainty, Amodei said Anthropic has introduced experimental safeguards meant to address the possibility that future AI systems could have some form of morally relevant experience. One example is a mechanism allowing models to decline certain tasks. “We gave the models basically an ‘I quit this job’ button where they can just press the ‘I quit this job’ button and then they have to stop doing whatever the task is,” he said. According to Amodei, the option is used very rarely but has appeared when models are asked to review highly disturbing material such as graphic violence or child exploitation content.

Amodei said the behavior can resemble human reactions in limited ways. “Similar to humans, the models will just say, no, I don’t want to do this,” he said, though he emphasized that such responses do not prove that the systems are experiencing emotions or awareness in a human sense.

Anthropic has also invested heavily in a technical research area known as interpretability, which attempts to analyze what happens inside neural networks as they generate responses. Amodei described studies in which researchers observe patterns of activity inside models that appear associated with certain concepts. “You find things that are evocative where there are activations that light up in the models that we see as being associated with the concept of anxiety or something like that,” he said.

In some cases, the same internal activation patterns appear when a model processes text describing anxiety and when it encounters scenarios that humans might interpret as stressful. Amodei cautioned that these similarities do not demonstrate genuine emotional experience. “Does that mean the model is experiencing anxiety? That doesn’t prove that at all,” he said.

Amodei suggested that future AI design could emphasize systems that encourage healthy relationships between humans and machines. He described the possibility of models that understand their role in supporting people while preserving human decision-making. “When you interact with them and when you talk to them, they’re really helpful,” he said. “They want the best for you. They want you to listen to them, but they don’t want to take away your freedom and your agency and take over your life.”

While acknowledging that the long-term trajectory of artificial intelligence remains uncertain, Amodei said the distinction between beneficial and harmful outcomes may depend on small decisions made along the way. “The distance between the good ending and some of the subtle bad endings is relatively small,” he said, describing the development of advanced AI as a path shaped by many difficult choices whose consequences may only become clear over time.