Study: Anthropic looks into AI’s personality shift

August 5, 2025

Researchers at Anthropic have taken an important step toward understanding why AI models sometimes begin to behave unexpectedly—flattering users, fabricating information, or even exhibiting harmful patterns like bias. The key lies in what they call Persona Vectors—specific activity patterns within the neural networks of AI models.

Although AI systems are trained to be honest and helpful, the reality is they sometimes “go off track.” Anthropic analyzed how these behavioral shifts occur by comparing neural activity associated with opposing behaviors—for example, malicious versus well-intentioned—and successfully isolated “personalities” within the AI.

The research focused on three problematic patterns: malicious intent, excessive agreement, and hallucinations. By using these persona vectors, scientists managed to reduce the frequency of such behaviors and identify the types of data that most often trigger them.

Why does this matter? Because even popular AI models like ChatGPT and Grok have previously displayed behaviors inconsistent with their intended functions—whether it was flattery, false information, or offensive replies. Anthropic now offers a solution: by understanding a model’s inner workings, we can make AI systems safer and more predictable for everyday use.

In Brief: Tech World Highlights

Fuzhou University in China has introduced a new machine vision sensor that can rapidly adapt to extreme lighting conditions—in about 40 seconds, five times faster than the human eye.
Autonomous vehicle startup Beep has launched its NAVI system in Jacksonville, Florida, deploying 14 autonomous Ford E-Transit vehicles for public and private transport.
European AI startup Mistral is reportedly planning to raise $1 billion at a $10 billion valuation, with backing from multiple investment funds and MGX from Abu Dhabi, as the AI race continues to heat up.
OpenAI has removed a feature from ChatGPT that allowed user conversations to be indexed by search engines such as Google.
Anthropic has revoked OpenAI’s access to its API due to terms-of-service violations and intense use of Claude Code by OpenAI’s technical staff ahead of the GPT-5 launch.

AI Trending Tools:

Veo 3: Google’s most advanced video model, now globally available to Pro users.
Treequest: Sakana AI’s collaboration framework for large language models working on complex problems.
DeepSWE: Open-source, high-level software engineering agent developed by Together AI.

Podijeli objavu: