Weekly Updates - May 4th 2026
Weekly Voice and Video AI Product and Platform news
🗞️ Market and Product News
Introducing real-time voice agents in Microsoft Copilot Studio. General availability of real‑time voice agents in Microsoft Copilot Studio launching in Dynamics 365 Contact Center.
Amazon Connect becomes ‘Connect Customer’. AWS announced Amazon Connect is expanding into four agentic AI solutions: Decisions, Talent, Customer and Health.
Alibaba Qwen in Chinese vehicles. Nine automakers announced Qwen integration at the Beijing Auto Show. Drivers can book hotels, order food, and track parcels by voice using Qwen-Omni running on an edge+cloud architecture.
Tells turns the same number used for SMS into a natural AI voice agent in one click. Tells.co launched AI Voice Agents, a new capability that lets a business activate a real, natural voice agent on the exact same phone number it already uses for SMS. Activation is a single toggle in the Tells dashboard.
Taylor Swift files sound trademark to protect against voice cloning. First major celebrity to use sound marks specifically as an AI cloning defense.
AI in Field Sales: Real World Challenges and Solutions from aiOla. Walkthrough of why standard voice AI fails for mobile sales reps: noisy environments, no stable connection, zero desk time.
🧰 Platform News
Introducing Flux Multilingual: One Conversational Speech Model for Global Voice Agents. Deepgram’s first multilingual real-time STT: 10 languages in a single API endpoint. Native turn detection and code-switching, streaming latency under 400ms.
NVIDIA Launches Nemotron 3 Nano Omni Model. Open multimodal model that handles video, audio, image, and text natively — hearing tone and background noise rather than reading a transcript. Claims 9x higher throughput than comparable open omni models.
AssemblyAI Releases Voice Agent API. Unified WebSocket pipeline covering STT (Universal-3 Pro Streaming), LLM reasoning, TTS, turn detection, and interruption handling in one connection.
📖 Reading
How barge-in handling impacts the quality of your voice AI (Poly.ai). PolyAI’s deep-dive on barge-in handling: interruptions happen in roughly 1 in 5 calls, and false positives — triggered by background noise — do more damage to caller trust than missed ones.
Insurance Claim Live Agent Team example (Awesome LLM Apps). Good example from Shubham Saboo on how to use ADK and Gemini Live for extracting structured information from a live conversation.
[YouTube] Pipecat 1.0 (Pipecat TV). The Pipecat core team celebrates the release of Pipecat 1.0, a huge milestone after two years and 100+ releases. The crew dives into their favorite features, what made it into 1.0, and some of the challenges along the way.
📦 Releases
LiveKit Agents: v1.5.7. SLNG support, Timed transcriptions, LLM gateway priorities, gpt-5.4-mini, and many fixes, configuration options and upgrades.
Pipecat: v1.1.0. New STT providers (Mistral Voxtral, xAI) + streaming TTS (xAI, Soniox) for lower-latency voice agents, Deegram Flux multi-language, faster turn taking and many fixes and updates.
TEN Framework: No releases.
