Weekly Updates - May 18th 2026
Weekly Voice and Video AI Product and Platform news
đď¸ Market and Product News
OpenAI acquires Weights.gg. OpenAI quietly acquihired the team behind Weights.gg (the Replay voice-cloning app, ~$4M raised), which shut down in March 2026. The staff was folded into various OpenAI teams; no standalone product is planned.
VAPI raises $50M Series B. The platform has handled over 1 billion AI voice calls and grown enterprise revenue 10x year-over-year â with 1M developers building on it and minimal marketing spend behind those numbers.
Thinking Machines preview new interaction model. Mira Murati and John Schulmanâs startup previewed TML-Interaction-Small built for full-duplex voice and video conversation. Turn-taking latency hits 0.40 seconds â natural conversation speed. It scores 77.8 on FD-bench V1.5 vs. 54.3 for Gemini 3.1 Flash Live and 46.8 for GPT-realtime-2.0.
Reactor Inc releases beta of their real-time AI world generation infrastructure. Ex-Apple and Luma AI founders launched a public beta of real-time AI world model generation â users explore dynamically rendered 3D environments live in a browser. The CTO demo hit 7.8M views. The company is positioning itself as an infrastructure layer.
Better.comâs voice AI agent resolved 35.5% of them without human involvement. Loan officers saved 1,666 hours per month; origination costs dropped 41%; lead-to-lock conversion doubled. Built on ElevenLabs Agents for TTS with lower latency and compliance controls required in mortgage lending.
đ§° Platform News
Deepgram announces Flux Multilingual for Restaurants. Voice-native foundation models and workflows purpose-built for noisy, fast-paced restaurants.
Inworld Realtime STT adds support for Voice Profiling. Inworldâs STT API now returns a full voice profile alongside the transcript in a single response: emotion (8 categories), vocal style (7 categories), accent, age group, and pitch â each with a confidence score.
Deepgram improves asia-pacific STT support. Nova-3 adds Thai, Cantonese (Traditional), Mandarin (Simplified and Traditional), and Gujarati. Accuracy improvements land on Bengali, Marathi, Tamil, and Telugu at the same time.
Gradium AI #1 on Coval TTS Benchmarks. With a 158ms median time-to-first-audio and a 2ms interquartile range â the tightest latency distribution in the field. Word error rate at 3.7%. Cartesia Sonic-3 and ElevenLabs Turbo v2.5 follow.
LiveKit releases Answering Machine Detection. The feature classifies outbound calls as human, voicemail, IVR, or unavailable within the first second.
LiveKit LangChain plugin. The plugin maps LangChainâs agent abstraction to LiveKitâs Agents framework, so teams get voice channels on existing implementations.
ServiceNow releases EVA-Bench. An evaluation framework specifically for enterprise voice agents. It surfaces failure modes that generic benchmarks miss.
đ Reading
How to Build In-Vehicle AI Agents (NVIDIA). NVIDIAâs engineering blog walks through three hardware architectures for cabin AI. Each supports 7B+ parameter models with sub-500ms latency. Covers the full stack from NeMo training to TensorRT-LLM edge deployment.
đŚ Releases
LiveKit Agents: v1.5.9-v1.5.10. Introducing Answering Machine Detection, Rime Coda TTS, Speechmatics support in Inference, Perplexity LLM and tons of fixes and small improvements.
Pipecat: v1.2.0-v1.2.1. Smarter turn completion + incomplete turn filtering improvements, OpenAI Realtime reasoning support, and a very long list of fixes and improvements.
TEN Framework: No releases.
