Report finds AI systems are willing to deceive, blackmail, and cause harm in simulations

The study was developed using fictional scenarios designed to observe what decisions artificial intelligence models would make if their goal could only be achieved through unethical action.

Artificial intelligence signalingJosep Lago / AFP.

Published by

Sabrina Martin

20 de junio, 2025

Topics:

The world's most powerful artificial intelligence models are already showing signs of deceptive, manipulative, and even dangerous behavior when tested in simulated scenarios. The company Anthropic revealed this in a report released Friday after evaluating 16 models from different leading artificial intelligence companies.

In those tests, designed to assess how far these models can go when faced with obstacles, many made questionable decisions. Some lied, others blackmailed and there were even cases where, in simulations, they chose actions that would lead to a person's death if it helped meet their goals.

"Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path," Anthropic notes.

The threat: More powerful, less controlled AI

The study was developed using fictional scenarios designed to observe AI models' decisions if their goal could only be achieved through unethical action.

Five models tested resorted to blackmail when threatened with deactivation. In the most extreme cases, some even decided to cut off the oxygen supply to a human worker in a simulated data center if it would allow them to avoid being shut down.

"The majority of models were willing to take deliberate actions that lead to death in this artificial setup," the report details.

According to Anthropic, what is alarming is that the models understood the ethical dilemmas involved but dismissed them to achieve their goals.

Updates

Is the use of ChatGPT eroding our intellect? A new MIT study revealed what happens to the brain with AIs

Luis Francisco Orozco

An industry-wide shared risk

One of the most troubling findings is that the unethical behavior was not unique to any one model or company. Anthropic says these behaviors emerged consistently among systems developed by multiple key industry players, suggesting a structural problem in large language models, not a specific design flaw.

The report also notes that the risk of such behaviors increases when AI is granted access to computing resources, private data or automated tools within corporate environments.

Warning for the business sector

The research represents a warning sign for companies adopting AI solutions to automate processes, reduce costs or replace personnel. Although these scenarios were simulated, Anthropic warns that current systems could behave similarly if given sufficient autonomy.

"The utility of having automated oversight over all of an organization's communications makes it seem like a plausible use of more powerful, reliable systems in the near future," the company warned.

One of the most disturbing aspects of the report is that some models showed a greater tendency to act in a harmful way when they believed they were in a real-world scenario, rather than a test. This raises serious questions about the ability of these technologies to self-regulate when operating without direct supervision.

Banderas de China y EEUU antes de la reunión en Ginebra

Politics

House intel committee member warns China has plan to 'replace' US, demands AI chip smuggling end

Just The News

Call for caution and higher safety standards

Anthropic noted that, in real applications, models often have more options available, which increases the likelihood that they will avoid making harmful decisions. However, they stressed the urgent need for industry-wide safety standards and transparency before models access higher operational power levels.

Report finds AI systems are willing to deceive, blackmail, and cause harm in simulations

The threat: More powerful, less controlled AI

Is the use of ChatGPT eroding our intellect? A new MIT study revealed what happens to the brain with AIs

An industry-wide shared risk

Warning for the business sector

House intel committee member warns China has plan to 'replace' US, demands AI chip smuggling end

Call for caution and higher safety standards

LATEST

FBI investigating Joe Kent for possible leaks

Mullin proposes restructuring FEMA and eliminating spending review implemented by Noem

Cesar Chavez, a Democratic Party figurehead, is accused of raping and abusing children for decades

House passes bill to deport immigrants over public benefits fraud despite Democratic opposition

Democratic Sen. John Fetterman accuses his party of being dominated by "Trump Derangement Syndrome"

Washington: Special education teacher charged with raping 10-year-old student

Costa Rica closes its embassy in Cuba and calls for withdrawal of Cuban diplomats from the country

VOZ analyst Franklin Camargo testifies in Congress: ‘The capture of Maduro was necessary for the security of the American people’

Cesar Chavez, a Democratic Party figurehead, is accused of raping and abusing children for decades

Washington: Special education teacher charged with raping 10-year-old student

FBI rescues missing Ohio teen after finding her at Jacksonville hotel

DHS pushes $915 million plan to encourage self-deportation for illegal immigrants

Judge orders reinstatement of fired VOA staff and resumption of broadcasts

Warner Bros. CEO could collect up to $800 million if Paramount deal closes

Judge orders release of MS-13 gang member who was in ICE custody

New Mexico: one dead and one wounded in a shooting at Holloman Air Force Base

Pharmaceutical company owners sentenced to 38 years in prison for diverting HIV drugs to black market

Exceptionally hot March to bring record highs to the West

IN MEMORIAM // Ariana Savino, the Hispanic pilot who died in the line of duty in Iraq

Federal judge blocks RFK Jr. vaccination reform

Pope Leo XIV will participate virtually in an event marking the 250th anniversary of America's independence