The most affordable AI model APIs ranked by cost per million tokens.
Last updated: April 2026
Anthropic
Anthropic's most powerful reasoning model with extended thinking. Excels at complex analysis, multi-step math, advanced coding, and nuanced writing.
OpenAI
OpenAI's latest flagship. Advanced multimodal capabilities with native tool use, improved reasoning, and massive knowledge.
Balanced intelligence and speed. Strong reasoning with faster response times and lower cost than Opus.
Google
Google's most capable model with 1M+ token context, native multimodal understanding, and deep research capabilities.
OpenAI's most powerful reasoning model. Uses chain-of-thought to solve complex math, science, and coding problems.
DeepSeek
671B MoE model with only 37B active parameters. Open-weight, excels at math, coding, and Chinese language tasks.
Reasoning-specialized model trained with reinforcement learning. Shows chain-of-thought reasoning transparently.
The model that defined a generation. Fast, smart, and incredibly capable across coding, analysis, and creative tasks.
Ultra-fast multimodal model optimized for speed. Great balance of capability and latency.
Black Forest Labs
State-of-the-art image generation with photorealistic outputs, excellent prompt adherence, and fast inference.
Omni model supporting text, vision, and audio natively. Fast and capable with strong multimodal understanding.
Moonshot AI
1T+ MoE architecture with strong long-context and multi-step reasoning. Open weights, competitive with top models.
Cost-effective reasoning model. Provides strong reasoning capabilities at a fraction of o3's cost.
OpenAI's text-to-video model. Generates high-quality, coherent videos up to 1 minute from text prompts.
Perplexity
Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests. This model powers the Pro Search mode on the Perplexity platform. Sonar Pro Search adds autonomous, multi-step reasoning to Sonar Pro. So, instead of just one query + synthesis, it plans and executes entire research workflows using tools.
Alibaba
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
Qwen2.5-Coder-7B-Instruct is a 7B parameter instruction-tuned language model optimized for code-related tasks such as code generation, reasoning, and bug fixing. Based on the Qwen2.5 architecture, it incorporates enhancements like RoPE, SwiGLU, RMSNorm, and GQA attention with support for up to 128K tokens using YaRN-based extrapolation. It is trained on a large corpus of source code, synthetic data, and text-code grounding, providing robust performance across programming languages and agentic coding workflows. This model is part of the Qwen2.5-Coder family and offers strong compatibility with tools like vLLM for efficient deployment. Released under the Apache 2.0 license.
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.
rekaai
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.