
short
- NVIDIA unveiled the Nemotron 3 Ultra at Computex on June 1, a model with an open weight of 550 billion parameters.
- The model delivers more than 300 tokens per second on the pre-release DeepInfra endpoint, and runs three to six times faster than Chinese competitors.
- But Moonshot AI’s Kimi K2.6 still tops the openweight intelligence rankings.
Jensen Huang walked the Computex stage in Taipei on Sunday, wearing a leather jacket, and unveiled the Nemotron 3 Ultra — Nvidia’s largest open-air AI model ever, and, at least so far, the smartest open-weight model built in America. it’s good. It’s not good enough to beat China.
The model has approximately 550 billion total parameters but operates on only 55 billion active parameters at any given moment, using a design called expert mixture. Parameters are what determine the breadth of knowledge of an AI model, where a larger number generally means greater power.
To understand how the expert mix model works, think of it like a hospital with hundreds of specialists: when a patient comes in, only the doctors in question actually show up — not the entire staff. This approach keeps the cost of running the model much lower than the number of key parameters might suggest, which is exactly why Nvidia is making the claim 5x faster deduction It costs 30% less than similar open weight alternatives.
Independent evaluator synthetic analysis that In partnership with In a pre-release evaluation, Nvidia placed the Nemotron 3 Ultra at 48 on its Intelligence Index — a composite benchmark that combines 10 ratings including reasoning, coding, general knowledge, and agent performance, scored on a numbered scale where higher means smarter.
This makes it the top open weight model in the US by a comfortable margin. The next closest US options are Google’s Gemma 4 31B at 39, Nemotron 3 Super at 36, and OpenAI’s Gpt-oss-120b at 33.
NVIDIA just announced the launch of the Nemotron 3 Ultra in Jensen Huang’s keynote at Computex: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and the most intelligent open-weight model in the US
We have partnered with @nvidia To evaluate this model for… pic.twitter.com/WPXZGLBOn8
– Artificial Analysis (@ArtificialAnlys) June 1, 2026
The gap between its predecessor is staggering. Released in March 2026 with 120 billion parameters, Nemotron 3 Super is considered a powerful open model for autonomous agents. The Ultra jumps 12 points above it, which is a big jump in this benchmark scene.
What is the nemotron family?
Nvidia has been in the modular business for longer than most people realize. The first Nemotron-branded model was launched in November 2023, with the third generation announced in December 2025.
The suite comes in three sizes: Nano for lightweight tasks, Super for mid-range enterprise applications, and Ultra for complex logic workloads. All three share the same hybrid architecture that combines Mamba-2 layers, standard transformer attention, and expert mix routing.
Mamba-2 is an alternative to standard attention that processes long sequences at a fraction of the cost – convenient when you want a model capable of holding a million tokens in memory at once. Nemotron 3 Ultra supports a 1 million token context window, which means an agent can, in theory, have an entire large code base or hundreds of research documents displayed simultaneously.
The Ultra model also includes a technology called multi-symbol prediction (MTP), which allows the model to predict several future symbols at once instead of one symbol at a time, speeding up the generation process. All three models were Nemotron 3 After training Using reinforcement learning across multiple interactive environments, teaching them how to plan and carry out multi-step tasks rather than just answering questions.
Ultra’s weights are becoming public and its training recipes are being released. Do you need a supercomputer to run it? Basically, yes, there is a 550 billion parameter model in the data center area. But you can access it through Nvidia’s API or cloud providers without owning the hardware yourself, in the same way that someone would already use GPT or Cloud through the browser.
Fast model, slower brain
The story of speed is where the Nemotron 3 Ultra really stands out. On pre-release DeepInfra endpoint,The model provided more than 300 output codes per second. The Chinese models in their intelligence class – DeepSeek V4 Pro and Kimi K2.6 – are served at a rate of 50 to 100 codes per second through their commercial APIs today. This speed gap is important for real-world deployments, especially for autonomous agents executing long, multi-step tasks where the wait for each step adds up quickly.
But raw speed doesn’t decide the intelligence contest. The published synthetic analysis chart clearly tells the actual story. On the vertical axis – intelligence – Nemotron 3 Ultra is ranked 48th, which is good, but China’s Kimi K2.6 from Moonshot AI is ranked 54th. The six-point gap in the index represents a meaningful difference: Kimi K2.6 was released in April 2026 and currently ranks fourth among all AI models globally, closed or open, and is just three spots behind leaders Anthropic, Google and OpenAI – all tied for position. 57.
The open weight situation in the United States is not new. Chinese labs are flooding the open ecosystem with powerful models, while American companies — like OpenAI, Anthropic, and Google — keep their best systems behind APIs. like The decryption was reported in MarchChinese open source models jumped from about 1.2% of global open source use in late 2024 to about 30% by the end of 2025. Nvidia is the biggest American name actively trying to reverse this trend, with Publicly disclose the five-year plan To spend $26 billion on developing open-weight artificial intelligence.
The Nemotron 3 Ultra is the most obvious result of this bet yet. Nvidia also announced that it is already working on Nemotron 4 – the next generation – developed through the Nemotron Coalition, a group of eight AI labs including Mistral AI and Perplexity that Nvidia brought together in March 2026 to co-develop open frontier models on DGX Cloud infrastructure. The Nemotron 3 Ultra will ship on June 4.
Daily debriefing Newsletter
Start each day with the latest news, plus original features, podcasts, videos and more.





