โ† Back to all models
๐ŸŒ

Xiaomi: MiMo-V2-Omni

xiaomiยทMultimodal
๐Ÿ”ฅ 54trending

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities, 256K context window.

#text+image+audio+video->text#top-provider
๐Ÿงฎ

Undisclosed

Parameters

๐Ÿ“

262K tokens

Context Window

๐Ÿ”’

Proprietary

License

๐Ÿ“…

Mar 18, 2026

Released

๐Ÿ’ฐ Pricing

Input

$0.40

per 1M tokens

Output

$2.00

per 1M tokens

๐Ÿ”Œ

API Available

This model is accessible via API for integration into your applications.