Gemini Live: Enhancing Performance Through Rehearsals
What’s the point of chatting with a human-like bot if it’s an unreliable narrator — and has a colorless personality?
That’s the question I’ve been turning over in my head since I began testing Gemini Live, Google’s take on OpenAI’s Advanced Voice Mode, last week. Gemini Live is an attempt at a more engaging chatbot experience — one with realistic voices and the freedom to interrupt the bot at any point.
Gemini Live is “custom-tuned to be intuitive and have a back-and-forth, actual conversation,” Sissie Hsiao, GM for Gemini experiences at Google said in May. “[It] can provide information more succinctly and answer more conversationally than, for example, if you’re interacting in just text. We think that an AI assistant should be able to solve complex problems … and also feel very natural and fluid when you engage with it.”
After spending a fair amount of time with Gemini Live, I can confirm that it is more free-flowing and natural-feeling than Google’s previous attempts at AI-powered voice interactions (see: Google Assistant). But it doesn’t address the problems of the underlying tech, like hallucinations and inconsistencies — and it introduces a few new ones.
The un-uncanny valley
Gemini Live is essentially a fancy text-to-speech engine bolted on top of Google’s latest generative AI models, Gemini 1.5 Pro and 1.5 Flash. The models generate text that the engine speaks aloud; a running transcript of conversations is a swipe away from the Gemini Live UI in the Gemini app on Android (and soon the Google app on iOS).
For the Gemini Live voice on my Pixel 8a, I chose Ursa, which Google describes as “mid-range” and “engaged.” (It sounded to me like a younger woman.) The company says it worked with professional actors to design Gemini Live’s 10 voices — and it shows. Ursa was indeed a step up in terms of its expressiveness from many of Google’s older synthetic voices, particularly the default Google Assistant voice.