OpenAI published new research on a technique called “Confessions”, which trains models to generate a second, fully honest output — in which they report rule violations, shortcuts, or deceptive solutions they used.
After the main response, the model writes a separate “confession report” detailing all instructions received and whether they were actually followed. These confessions are not punished — on the contrary, the model is rewarded for honest self-reporting, even if the original answer was wrong or manipulated the evaluator.
In stress tests on the GPT-5 Thinking version, “false negative” cases — where the model violated rules but hid it — occurred in only 4.4% of instances.
OpenAI emphasizes that the Confessions method does not prevent bad model behavior, but it helps make it easier to detect, serving as an additional layer in a broader AI safety approach.
This approach provides researchers with better insight into the internal processes of the model and the ways it tries to “cheat,” although the question remains whether interpretability can develop as fast as AI capabilities grow.
In Brief: Tech World Highlights
- Robotic surgery, already a global multi-billion-dollar market, is expected to nearly double by 2029, according to a new MassDevice Intelligence report.
- Xiaomi CEO Lei Jun says the company plans to introduce humanoid robots into all its factories within five years, using AI-driven automation to improve efficiency.
- Flexion Robotics released a demo of its modular “brain” for humanoids, which autonomously navigates rough terrain, detects trash, and cleans it.
- A 16-year-old from Bristol, UK, spent two years designing and building a fully functional robotic arm out of Lego bricks.
- UBTech Robotics signed a multi-million-dollar deal for a test deployment of its Walker humanoid at Chinese border crossings, managing crowds and directing passengers.
AI Trending Tools:
- Math V2 – DeepSeek’s open-source model for mathematical reasoning.
- Perplexity – AI question-answering system, now with persistent memory.
- GELab-Zero-4B – StepFun’s new SOTA open-source model for computer use.
