ERNIE-4.5-VL-28B-A3B-Thinking: A Breakthrough in Multimodal AI — news
News/2026-03-08-ernie-45-vl-28b-a3b-thinking-a-breakthrough-in-multimodal-ai-news-news
Breaking NewsMar 8, 20263 min read

ERNIE-4.5-VL-28B-A3B-Thinking: A Breakthrough in Multimodal AI — news

Featured:Baidu
ERNIE-4.5-VL-28B-A3B-Thinking: A Breakthrough in Multimodal AI — news

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking, Achieving SOTA Multimodal Reasoning with 3B Active Parameters

BEIJING — Baidu on Thursday introduced ERNIE-4.5-VL-28B-A3B-Thinking, a new multimodal reasoning model that delivers state-of-the-art performance on visual reasoning tasks while activating only 3 billion parameters during inference.

The model builds directly on the ERNIE-4.5-VL-28B-A3B architecture and incorporates a 7-billion-parameter vision encoder for image processing. According to Baidu, the upgraded “Thinking” variant represents a significant advance in multimodal reasoning capabilities, outperforming comparable and larger models across multiple visual-language benchmarks.

The release, announced via Baidu’s ERNIE Blog and the model’s Hugging Face repository, positions the Chinese tech giant as a competitive player in open multimodal AI. Baidu claims the model posts benchmark wins over larger systems including GPT-5 and Gemini 2.5 in areas such as visual reasoning, document analysis and complex multimodal understanding.

Technical Architecture and Efficiency

ERNIE-4.5-VL-28B-A3B-Thinking employs a Mixture-of-Experts (MoE) style design that allows the full 28-billion-parameter model to deliver high performance while activating just 3 billion parameters per forward pass. This sparse activation approach dramatically reduces computational requirements and inference costs compared with dense models of similar capability.

The system features a dedicated 7B vision encoder paired with the language backbone, enabling strong performance on tasks that require fine-grained visual understanding and logical reasoning over image content. Official benchmark results cited by Baidu indicate excellent performance across standard multimodal evaluation suites, though specific numerical scores were not detailed in the initial announcement.

The model is being released openly on Hugging Face under the repository baidu/ERNIE-4.5-VL-28B-A3B-Thinking, allowing developers and researchers to experiment with and build upon the architecture.

Competitive Context

Baidu’s announcement arrives as global competition in multimodal AI intensifies. The company claims its latest ERNIE model surpasses leading systems from OpenAI and Google in several visual reasoning and document intelligence benchmarks, signaling growing strength in Chinese AI research.

The release follows Baidu’s earlier ERNIE-4.5 series developments and coincides with reports of the company preparing even larger native multimodal models, including ERNIE 5, which is described as capable of natively generating multiple media types.

Industry Impact

For developers and enterprises, the model offers a compelling combination of frontier-level multimodal reasoning and significantly lower inference costs due to its 3B active parameter design. This efficiency could make advanced visual reasoning more accessible for real-world applications in document processing, visual question answering, chart analysis and enterprise knowledge work.

The open release on Hugging Face is expected to accelerate experimentation and integration within the broader AI developer community, particularly among organizations seeking strong multimodal capabilities without the infrastructure demands of much larger dense models.

What’s Next

Baidu has not yet disclosed a detailed timeline for further upgrades or the full release of ERNIE 5. Industry observers anticipate additional technical papers and comprehensive benchmark disclosures in the coming weeks to substantiate the claimed performance gains over GPT-5 and Gemini 2.5.

The company is expected to continue expanding the ERNIE ecosystem with improved multimodal generation capabilities and tighter integration into its cloud and enterprise AI offerings. As more third-party evaluation results emerge, the precise positioning of ERNIE-4.5-VL-28B-A3B-Thinking relative to closed frontier models will become clearer.

The model is available for immediate download and testing via Hugging Face.

Original Source

yiyan.baidu.com

Comments

No comments yet. Be the first to share your thoughts!