4 min read

ChatGPT-4o Can Speak With You, OpenAI Admits Voice Features “Present a Variety of Novel Risks”

Published May 14, 2024 1:48 PM

By James Morales

GPT-4o could change the way people interact with chatbots. Photo by Fausto Sandoval on Unsplash.

Key Takeaways

OpenAI has announced a new voice-enabled AI model, GPT-4o.
The speed with which GPT-4o can respond to voice prompts is on par with human response times.
However, the new feature isn’t being released immediately.
Acknowledging the model’s “novel risks,” OpenAI has committed to further safety testing before a full public rollout.

OpenAI has upgraded the GPT-4 large language model, incorporating more multimodal functionalities. The update, GPT-4o (“o” for “omni”), introduces a key new feature: the ability to converse using natural speech rather than text-based prompts and responses.

While this development promises a more intuitive and engaging user experience, OpenAI acknowledges that voice capabilities also introduce “a variety of novel risks.”

Improvements to Chatbot Voice Response

In an official announcement, OpenAI described GPT-4o as “a step towards much more natural human-computer interaction.”

Unlike the standard GPT-4, OpenAI trained the new model on text, image and audio data simultaneously. This means it can process voice inputs natively, rather than relying on separate AI modules for voice-to-text and text-to-voice transcription.

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time.

It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction): pic.twitter.com/VLG7TJ1JQx

— Greg Brockman (@gdb) May 13, 2024

Whereas the previous versions had an average latency of 5.4 seconds, GPT-4o takes an average of just 320 milliseconds to generate a response. This, OpenAI observed, puts it on par with human response times.

The company started integrating GPT-4o’s text and image capabilities into ChatGPT on Monday, May 13. But before it rolls out the new voice features, the model will undergo further safety testing.

OpenAI Acknowledges Potential Risks

Commenting on the new model, OpenAI said, “we recognize that GPT-4o’s audio modalities present a variety of novel risks.”

Reading between the lines, this could be a reference to unauthorized AI impersonation.

In recent months, the capacity for modern generative AI to convincingly emulate real voices has sparked concerns over deceptive deepfakes and the rights of voice artists.

In a move that could help prevent the new model from being abused, OpenAI said that GPT-4o’s audio outputs will be limited to a selection of preset voices.

Potential Applications of Voice AI

While the new voice-enabled version of ChatGPT has its limitations, its capacity for natural language is superior compared to the previous generation of voice assistants.

Platforms like Siri and Alexa can’t really be called conversational and are mostly used to carry out a limited range of tasks: searching for basic information online, setting alarms or taking notes, for example.

However, GPT-4o suggests the future of AI voice interaction will be far more personalized and intuitive.

OpenAI has just demonstrated its new GPT-4o model doing real-time translations 🤯 pic.twitter.com/Cl0gp9v3kN

— Tom Warren (@tomwarren) May 13, 2024

One area where the technology holds significant potential is customer service, where AI voice assistants can replace clunky IVR (Interactive Voice Response) systems.

Voice AI could also be a powerful tool for blind people, helping them navigate challenging situations and environments more independently.

Finally, models like GPT-4o could drastically improve the performance of real-time AI translation services.

The Future of Human-AI Interactions

Emerging technologies like the Rabbit R1 and Humane’s AI pin are betting on the next stage of human-AI interaction being more voice-driven.

These devices envisage a post-smartphone technoscene in which voice-enabled AI supplants screen-based media as the dominant mode of interacting with digital information flows.

Platforms like GPT-4o will be powerful agents if this vision is to become a reality. But they still have a long way to go if they are to change behavioral patterns that have been entrenched over decades.

Was this Article helpful? Yes No

ChatGPT-4o Can Speak With You, OpenAI Admits Voice Features “Present a Variety of Novel Risks”

Improvements to Chatbot Voice Response

OpenAI Acknowledges Potential Risks

Potential Applications of Voice AI

The Future of Human-AI Interactions

James Morales

Google’s AI Search Engine Tool Threatens OpenAI by Cutting Out The Middleman

Joe Biden Proposes Ban on AI Voice Impersonations: Aren’t They Already Illegal?

BBC Shares Real Reason For AI Voiceover Following Mamma Mia Backlash