Weekly Updates - Jul 14th 2025

Jul 14, 2025

Grok 4 upgraded Voice Mode. Brand new voice more natural sounding and added vision capabilities.
Announcing Deepgram Saga: The Voice OS for Developers. New voice interface (Desktop app + extension) focused on dev workflows.
Cerebrium announces new round of funding. Interesting that the 3 AI customers mentioned in the announcement are all RealTime AI: Tavus, VAPI and Deepgram.
AI Media: Real-Time Voice is unlocking new reveneu for broadcasters. Overview of applications of Voice AI for media broadcasting presenting LEXI voice translation product.
AI voice used to impersonate Marco Rubio. Fake voice and text messages on Signal tricked senior leaders, as AI impersonation rises in global politics.

🧰 Platform News

Google DeepMind releases GenAI Processors. New open-source Python library to build pipelines with multimodal input and requiring real-time.
Vogent introduces Vogent Voicelab. Optimized API to run top open-source voice models.
Character.AI introduces TalkingMachines model. New autoregressive diffusion model that enables real-time, audio-driven, FaceTime-style video generation.
Introducing Vapi CLI. The command-line interface that brings world-class Developer Experience for Building Voice AI Agent.
Inword TTS now available inVapi.

How creating Tavus Sparrow made me a better conversationalist. Learnings about timing and the importance of silences on the development of Tavus Sparrow model.
Adding Long-term memory to Gemini 2.5 Chatbot. Using Mem0 to provide scalable long-term memory, addressing the limitations of fixed context windows.
[Video] Two Founders, One Vision: The Complete Vapi Story. Vapi founders share how they ivoted multiple times, and eventually found product-market fit as voice AI infrastructure for startups.
[Video] Coval Voice AI office hours live. Voice agent workshop covering topics like latency, interruptions, prompt engineering, observability and evaluations.
Building intelligent AI voice agents with Pipecat and Amazon Bedrock – Part 2. Explore how to use speech-to-speech foundation model, Amazon Nova Sonic, and the benefits of using a unified model.
Build a production-ready voice agent with Baseten, LiveKit, and LlamaIndex.
Cartesia Hierarchical Model. Research on new architecture (H-Net) that replaces tokenization with a dynamic chunking process directly inside the model.

LiveKit - Livekit 1.1.6: Mistral AI, Google TTS pronuntiations, remote turn detection and bugfixes.
Pipecat - Pipecat 0.0.76: New event for VAD/turn settings, deps upgrades and bugfixes.