Ultravox is a real-time voice AI platform built by Fixie.ai, designed from first principles to deliver fast, natural, and scalable voice agent experiences. Unlike conventional voice AI systems that transcribe speech to text before running inference, Ultravox processes audio through a speech-native model that preserves paralinguistic signals such as tone, cadence, and pitch, resulting in significantly lower latency and more human-like conversations.
The platform provides developer-friendly REST APIs and SDKs for web and mobile, enabling teams to integrate conversational voice agents into their products with minimal friction. Built-in telephony support connects Ultravox to major telephony providers, and features such as an Outbound Call Scheduler, custom voice clones, and retrieval-augmented generation corpora help teams build and scale sophisticated voice workflows.
Ultravox manages its own full inference stack and dedicated infrastructure, eliminating dependency on external LLMs or shared inference pools. The core model, Ultravox v0.7, achieves a state-of-the-art score of 91.8% on Big Bench Audio without reasoning and 97% with thinking enabled. The platform also includes UltraVAD, a neural voice activity detection model that predicts turn-taking by recognizing pause patterns and conversation states in real time.
Pricing follows a usage-based model at $0.05 per minute after an initial 30-minute free allocation, with a Pro plan at $100 per month that removes hard concurrency limits and unlocks additional capabilities. Ultravox is built on open weight models published on Hugging Face, reflecting the company's commitment to open science.
- Build real-time voice AI agents that handle natural conversations with users
- Integrate voice capabilities into products using developer REST APIs and SDKs
- Deploy outbound call scheduling workflows for sales or customer support teams
- Create custom voice clones for branded real-time conversational experiences
- Add telephony-connected voice agents to existing inbound call center operations
- Use RAG corpora to give voice agents access to structured knowledge bases
- Prototype and test voice AI experiences in an unlimited playground environment
- Scale voice agent infrastructure without hard concurrency limits on the Pro plan
- Process paralinguistic signals like tone and cadence during real-time audio inference
- Build and deploy speech-native AI applications using open weight model infrastructure

