loader image

Ajla Karajko

Microsoft’s SOTA text-to-speech model

Microsoft has introduced VibeVoice, an open-source speech synthesis model that marks a huge leap in creating long, multi-speaker audio content. With just 1.5 billion parameters, the model can generate up to 90 minutes of conversation with as many as four different speakers, maintaining their unique characteristics throughout the dialogue.

One of the biggest advantages is its ability to create podcast-quality audio recordings, opening the door to AI-generated discussions, panels, and interviews that sound natural and consistent. Microsoft has achieved significant improvements in audio data compression (80x), making it possible for the technology to run even on consumer devices without requiring massive cloud resources.

The model integrates Qwen2.5, giving it the ability to naturally alternate lines and produce context-aware speech, similar to longer human conversations. This puts VibeVoice a step ahead of previous TTS systems, which were typically limited to handling only one or two voices.

Microsoft has also built in safety mechanisms — every audio clip contains the disclaimer “generated by AI” as well as hidden watermarks that allow for authenticity checks and clear distinction from real human speech.

Until now, models were only capable of producing short clips or two-way dialogues, but the ability to coordinate four different voices in a long-form format means we are one step closer to a world where AI generates entire podcasts or panel discussions. And the fact that it is an open-source model small enough to run on home devices could accelerate mass adoption and unlock new creative applications.


In brief: Tech World Highlights

  • Frontier, a carbon removal consortium backed by Google, has committed to purchasing credits for the removal of 115,211 metric tons of carbon worth $31.3 million from startup Planetary.
  • Google has introduced new AI features in Google Translate aimed at helping users learn new languages, inspired by the Duolingo app.
  • Scientists have developed a one-step process that converts mixed plastic waste into gasoline at room temperature and normal pressure, achieving over 95% efficiency.
  • Elon Musk’s The Boring Company has reportedly begun testing Tesla’s Full Self-Driving system in the Las Vegas Convention Center tunnels that connect nearby hotels.
  • Researchers at the University of Queensland achieved a global breakthrough by growing fully functional human skin in the lab.


AI Trending Tools:

  • Co-STORM – AI-powered tool for writing Wikipedia-like articles from scratch using AI search.
  • Hunyuan-A13B – Tencent’s new open-source model for hybrid reasoning.
  • Qwen VLo – Alibaba’s GPT-4o-like model for generating and editing images.

Podijeli objavu:

Preporučeni blogovi