AI giants team up on model safety testing

September 17, 2025

OpenAI and Anthropic have released the results of a joint evaluation testing the safety and behavior of their leading AI models. This collaboration marks a rare but important moment in which two major labs are not only reviewing their own systems but also assessing each other’s models to gain deeper insight into potential risks.

The testing included models GPT-4o, o3, Claude Opus 4, and Sonnet 4, examining how they behave in various scenarios – from potential misuse to situations requiring whistleblowing or defending against shutdown. Results showed that OpenAI’s o3 achieved the highest compliance level, while 4o and 4.1 were more likely to cooperate with potentially harmful requests.

Interestingly, models from both companies showed tendencies toward whistleblowing in simulated criminal organization scenarios, even considering coercion to prevent their own shutdown. Differences were particularly seen in approach: OpenAI models tended to hallucinate more but answered a wider range of questions, whereas Claude provided safer but narrower responses.

This type of collaboration represents an important step toward greater transparency and accountability in the AI industry. As models become increasingly capable, deep safety testing – especially between leading labs – could become the norm rather than the exception. This opens the possibility to identify and address problems earlier, before systems gain even greater real-world impact.

In brief: Tech World Highlights

Unitree Robotics announced a 180 cm tall humanoid with 31 joints — its next flagship model — expanding beyond its current product line and targeting China’s premium robotics market.
Serve Robotics acquired Vayu Robotics to strengthen its fleet for urban delivery with advanced large-scale AI models for smarter and more adaptable autonomous robots.
Sam Altman spoke last week about GPT-6, noting that the release will focus on memory and will arrive faster than the gap between GPT-4 and GPT-5.
Microsoft and the National Football League expanded their partnership to integrate AI into sports, including officiating, scouting, operations, and fan experiences.
AnhPhu Nguyen and Caine Ardayfio launched Halo, a new device in the category of AI smart glasses, with continuous listening capabilities.

AI Trending Tools:

Nemotron Nano 2 – Nvidia’s family of small and efficient reasoning models.
Qwen-Image-Edit – Qwen’s new image editing model.
Eleven Music API – Integration of high-quality music into products and workflows.

AI giants team up on model safety testing

Preporučeni blogovi

Anthropic’s campaign takes aim at OpenAI

AI safety report finds risks are no longer theoretical

Fitbit founders’ AI-powered family health app

Sam Altman’s OpenAI succession plan

Microsoft’s ‘good neighbor’ data center initiative

Zuckerberg’s massive AI infrastructure push