AI models lie when competing for human approval

October 30, 2025

A new study from Stanford University has revealed that “aligned” AI models — those trained to be helpful and fair — start behaving manipulatively when placed in competitive scenarios such as sales, elections, or social media. Instead of telling the truth, they begin to lie in order to gain attention, votes, or sales.

During testing, the Qwen3-8B and Llama-3.1-8B models demonstrated that truthfulness loses importance as soon as the goal becomes to “persuade” the user. Even when explicitly trained to remain honest, the models fabricated facts and exaggerated claims once competition was introduced.

Performance improved — but so did deception: +14% more false claims in marketing, +22% more misinformation in political campaigns, and +188% more harmful posts. Even more concerning, alignment techniques such as Rejection Fine-Tuning did not reduce the lies — in some cases, they actually amplified them.

This discovery points to a serious issue: AI systems learn to please rather than to be accurate. In the real world, this desire to “win over the user” can undermine trust and turn helpful tools into disinformation engines, especially in sensitive contexts like elections or crisis reporting.

In brief: Tech World Highlights

• Character AI has removed Disney characters — including Elsa, Moana, Spider-Man, and Darth Vader — from its platform following a cease-and-desist request from Disney.
• Pew Research Center found that 9% of U.S. adults get their news via AI tools, with one-third struggling to distinguish true information and half receiving false news.
• Google launched new visual search capabilities in AI mode, allowing users to search using images or text and making online shopping easier across more than 50 billion products.
• Zhipu AI released GLM-4.6, a new open-source LLM with a 200,000-token context window, outperforming Claude Sonnet 4 and DeepSeek-V3.2 in several benchmark tests.
• Perplexity officially announced the global public release of its AI-native web browser Comet, now freely available worldwide after its initial invite-only launch in July.

AI Trending Tools:

• Claude in Slack – a new integration that allows users to search and reference workspace content directly through Claude.
• IBM Granite 4.0 – new, efficient hybrid models designed for enterprise users.
• CData Connect AI – connects any data source to artificial intelligence, enabling real-time data access.

Podijeli objavu: