New Gemini 2.5 capabilities
Native audio output and improvements to Live API
Today, the Live API is introducing a preview version of audio-visual input and native audio out dialogue, so you can directly build conversational experiences, with a more natural and expressive Gemini.
It also allows the user to steer its tone, accent and style of speaking. For example, you can tell the model to use a dramatic voice when telling a story. And it supports tool use, to be able to search on your behalf.
You can experiment with a set of early features, including:
- Affective Dialogue, in which the model detects emotion in the user’s voice and responds appropriately.
- Proactive Audio, in which the model will ignore background conversations and know when to respond.
- Thinking in the Live API, in which the model leverages Gemini’s thinking capabilities to support more complex tasks.
We’re also releasing new previews for text-to-speech in 2.5 Pro and 2.5 Flash. These have first-of-its-kind support for multiple speakers, enabling text-to-speech with two voices via native audio out.
Like Native Audio dialogue, text-to-speech is expressive, and can capture really subtle nuances, such as whispers. It works in over 24 languages and seamlessly switches between them.