Xiaomi: MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities, 256K context window.
Undisclosed
Parameters
262K tokens
Context Window
Proprietary
License
Mar 18, 2026
Released
๐ฐ Pricing
Input
$0.40
per 1M tokens
Output
$2.00
per 1M tokens
API Available
This model is accessible via API for integration into your applications.
โญ Related Models
GPT-5
OpenAI
OpenAI's latest flagship. Advanced multimodal capabilities with native tool use, improved reasoning, and massive knowledge.
GPT-4o
OpenAI
Omni model supporting text, vision, and audio natively. Fast and capable with strong multimodal understanding.
Gemini 2.5 Pro
Google's most capable model with 1M+ token context, native multimodal understanding, and deep research capabilities.