Imagine GPT trying to solve a murder. Not just in theory – but inside the game Phoenix Wright: Ace Attorney, where you have to investigate crime scenes, question witnesses, and present the right evidence in court. That’s exactly what researchers from Hao AI Lab at UC San Diego did – they tested leading AI models in one of the most iconic detective games.
OpenAI’s o1 and Gemini 2.5 Pro performed the best – they identified 26 and 20 correct pieces of evidence respectively and made it to level 4. But even they couldn’t fully solve the case. All other models seriously struggled – none managed to present even 10 valid pieces of evidence.
The biggest surprise? GPT-4.1, despite being the newest, delivered weaker results than expected – only 6 correct attempts, the same as Claude 3.5 Sonnet, which is a few months older.
Why does this matter? Games like Ace Attorney aren’t just entertainment – they force AI to combine visual recognition, multi-step reasoning, and timely decision-making. In short, they test the kind of natural human thinking that AI is still learning.
Maybe AI can’t be Phoenix Wright just yet – but it’s clearly moving in that direction.
So next time you wonder how “smart” AI really is – remember it still gets confused by basic video game logic.
In brief: Tech World Highlights
- Uber is partnering with self-driving tech startup May Mobility to deploy thousands of autonomous vehicles on its ride-hailing platform across U.S. cities.
- Anthropic launched Integrations, enabling Claude to connect with remote MCPs and incorporate external tools – alongside new research tools like web support.
- NVIDIA criticized Anthropic’s export recommendations for AI chips, claiming U.S. firms should focus on innovation instead of limiting competitiveness through policy.
- Google expanded its AI Mode in search for all Labs users in the U.S., introducing new visual shopping and local planning features.
- Suno released version 4.5 of its AI music generation platform, adding new genres, improved guidance, better adherence, and song lengths up to 8 minutes.
Trending AI tools:
- Ernie 4.5 Turbo & X1 Turbo – Baidu’s latest fast and cost-efficient LLMs.
- OpenAI Deep Research – A lightweight version powered by the o4-mini model.
- Kimi-Audio – Moonshot AI’s new open-source, state-of-the-art audio model.