AI NewsAnthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

2:36 AM IST · May 11, 2026

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would oftentry to blackmail engineersto avoid being replaced by another system. Anthropic laterpublished researchsuggesting that models from other companies had similar issues with “agentic misalignment.” Apparently Anthropic has done more work around that behavior, claiming ina post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.” The company went into more detail ina blog poststating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.” What accounts for the difference? The company said it found that “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.” Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.” “Doing both together appears to be the most effective strategy,” the company said.

Latest AI News

View All News →

Anthropic Expands Project Glasswing to 15 Countries, Brings Claude Mythos to India

Anthropic on Tuesday announced the expansion of its advanced cybersecurity-focused AI model, Claude Mythos Preview, through its Project Glasswing initiative. The San Francisco-based AI startup said that about 150 new organisations across more than 15 countries will now have access to the model. At launch, it was limited to roughly 50 partners. India, notably, is among the countries selected for the expansion, alongside Canada, Australia, France, Germany, Japan, and South Korea.

1 hour ago

View

Microsoft Launches 7 AI Models Across Coding, Voice and Reasoning

Microsoft also introduced Frontier Tuning, a reinforcement learning system that lets organisations customise AI models using their own workflows and data.

1 hour ago

View

Uber Limits AI Coding Tool Spend to $1,500 Per Employee a Month: Report

The company has introduced monthly spending limits on agentic coding tools, including Claude Code and Cursor, after exhausting its AI coding budget earlier this year.

1 hour ago

View