Best AI Audio & Speech Models

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million tokens and output is priced at $2.40 per million tokens.

🔒 proprietary📏 128K ctx💰 $0.60

Jan 19, 2026View details →

🎵

Whisper Large v3

OpenAI

🔥 60

Battle-tested speech recognition. Widely deployed, well-supported, excellent accuracy.

📊 2B🔓 open💰 $0.006/min

Nov 6, 2023View details →

🎵

OpenAI: GPT-4o Audio

OpenAI

🔥 58

The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs are currently not supported. Audio tokens are priced at $40 per million input and $80 per million output audio tokens.

🔒 proprietary📏 128K ctx💰 $2.50

Aug 15, 2025View details →

🎵

OpenAI: GPT Audio

OpenAI

🔥 41

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced at $32 per million input tokens and $64 per million output tokens.

🔒 proprietary📏 128K ctx💰 $2.50

Jan 19, 2026View details →

Browse More

AI Reasoning Models Text Generation Models Multimodal AI Models AI Models for Coding AI Image Generation Models AI Video Generation Models Embedding Models for RAG Open Source AI Models Free AI Models & APIs Cheapest AI Model APIs Long Context AI Models