Best Multimodal AI Models

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal

🔒 proprietary📏 128K ctx💰 $2.50

May 13, 2024View details →

🌐

OpenAI: GPT-4o-mini

OpenAI

🔥 78

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more. #multimodal

🔒 proprietary📏 128K ctx💰 $0.15

Jul 18, 2024View details →

🌐

Google: Gemini 3.1 Pro Preview Custom Tools

Google

🔥 78

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows. It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.

🔒 proprietary📏 1.0M ctx💰 $2.00

Feb 25, 2026View details →

🌐

OpenAI: GPT-4o-mini (2024-07-18)

OpenAI

🔥 78

🔒 proprietary📏 128K ctx💰 $0.15

Jul 18, 2024View details →

🌐

Google Gemini Pro Latest

~google

🔥 78

This model always redirects to the latest model in the Google Gemini Pro family.

🔒 proprietary📏 1.0M ctx💰 $2.00

Apr 27, 2026View details →

🌐

xAI: Grok 4.20 Multi-Agent

xAI

🔥 75

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. Reasoning effort behavior: - low / medium: 4 agents - high / xhigh: 16 agents

🔒 proprietary📏 2.0M ctx💰 $2.00

Mar 31, 2026View details →

🌐

Google: Gemini 2.5 Flash Lite Preview 09-2025

Google

🔥 72

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

🔒 proprietary📏 1.0M ctx💰 $0.10

Sep 25, 2025View details →

🌐

OpenAI: GPT-4o (2024-11-20)

OpenAI

🔥 72

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

🔒 proprietary📏 128K ctx💰 $2.50

Nov 20, 2024View details →

🌐

Apple Intelligence

Apple

🔥 66

Apple's on-device AI system powering Siri, writing tools, and image features across iPhone, iPad, and Mac.

🔒 proprietary

Oct 28, 2024View details →

🌐

Google: Gemini 2.5 Flash

Google

🔥 66

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

🔒 proprietary📏 1.0M ctx💰 $0.30

Jun 17, 2025View details →

🌐

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

Google

🔥 66

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model generates context-rich graphics, from infographics and diagrams to cinematic composites, and can incorporate real-time information via Search grounding. It offers industry-leading text rendering in images (including long passages and multilingual layouts), consistent multi-image blending, and accurate identity preservation across up to five subjects. Nano Banana Pro adds fine-grained creative controls such as localized edits, lighting and focus adjustments, camera transformations, and support for 2K/4K outputs and flexible aspect ratios. It is designed for professional-grade design, product visualization, storyboarding, and complex multi-element compositions while remaining efficient for general image creation workflows.

🔒 proprietary📏 66K ctx💰 $2.00

Nov 20, 2025View details →

🌐

Google: Gemini 3.1 Flash Lite Preview

Google

🔥 64

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

🔒 proprietary📏 1.0M ctx💰 $0.25

Mar 3, 2026View details →

🌐

Google: Gemini 3.1 Pro Preview

Google

🔥 63

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

🔒 proprietary📏 1.0M ctx💰 $2.00

Feb 19, 2026View details →

🌐

NVIDIA: Nemotron 3 Nano Omni (free)

Nvidia

🔥 59

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

🔒 proprietary📏 256K ctx

Apr 28, 2026View details →

🌐

Google: Gemini 2.5 Flash Lite

Google

🔥 58

🔒 proprietary📏 1.0M ctx💰 $0.10

Jul 22, 2025View details →

🌐

Google: Gemini 2.0 Flash Lite

Google

🔥 56

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices.

🔒 proprietary📏 1.0M ctx💰 $0.07

Feb 25, 2025View details →

Browse More

AI Reasoning Models Text Generation Models AI Models for Coding AI Image Generation Models AI Video Generation Models AI Audio & Speech Models Embedding Models for RAG Open Source AI Models Free AI Models & APIs Cheapest AI Model APIs Long Context AI Models