The AI industry packed an entire quarter's worth of announcements into a single week, with NVIDIA unveiling its post-Blackwell Rubin architecture, DeepSeek crossing the trillion-parameter mark, and frontier model releases stacking up faster than anyone can benchmark them.


NVIDIA Unveils Rubin: The Post-Blackwell Era Begins

NVIDIA officially unveiled its next-generation Rubin platform at GTC 2026 — a complete AI supercomputer architecture featuring the new Vera CPU, Rubin GPUs with Transformer Engines, and NVLink 6 interconnects. The platform is purpose-built for agentic AI and multi-step reasoning workloads, which signals where NVIDIA thinks the compute demand is heading: not just training larger models, but running autonomous agents that make dozens of tool calls per task and need sustained inference throughput rather than burst training capacity.

Alongside hardware, NVIDIA launched its Agent Toolkit, an open platform for developing autonomous AI agents, including OpenShell for secure runtime environments and the Nemotron 3 Super model — a 120B-parameter hybrid MoE model with 12B active parameters that scores 60.47% on SWE-Bench Verified. The Agent Toolkit is NVIDIA's clearest bet yet that the next wave of GPU demand won't come from training runs alone. Inference-heavy agentic workloads — where a single user request triggers hundreds of sequential model calls — could drive sustained GPU utilization in a way that periodic training jobs never did. If agents become the default interface for enterprise software, NVIDIA wants Rubin to be the floor they run on.

DeepSeek V4: 1 Trillion Parameters, 32B Active

DeepSeek V4 arrived in early March with a staggering 1 trillion total parameters while keeping only 32 billion active per token through its mixture-of-experts architecture. The efficiency play here is significant — near-frontier performance at a fraction of the compute cost. What makes V4 noteworthy beyond the raw parameter count is the training infrastructure behind it. DeepSeek has been open about their custom training framework and the architectural choices that let them scale MoE to this size without the instability that plagued earlier attempts. At 32B active parameters per token, the per-query cost stays manageable even at scale, which makes V4 a serious contender for self-hosted enterprise deployments where API costs at OpenAI or Anthropic would balloon quickly. The open-weights release continues DeepSeek's pattern of forcing the price of intelligence downward.

Frontier Model Wars Heat Up

March has been a blockbuster month for model releases. GPT-5.4 dropped in Standard, Thinking, and Pro variants, with the Pro tier showing particularly strong gains on coding benchmarks and multi-step reasoning tasks. Gemini 3.1 Ultra brought native multimodal reasoning with a 1M-token context window, making it the first production model where you can feed an entire codebase and get coherent analysis back. Grok 4.20 shipped with enhanced real-time web access, leaning into xAI's differentiation as the model that knows what happened five minutes ago. And Apple teased an AI-powered Siri overhaul for iOS 26.4, though details remain thin ahead of WWDC.

The pace is making benchmarks nearly useless. By the time a model tops a leaderboard, two competitors have already shipped something that matches it. The real differentiators are shifting from raw capability to ecosystem integration, pricing, and reliability under production load.

Meta Goes Custom: Four New AI Chips Announced

Meta revealed four generations of custom silicon — the MTIA 300, 400, 450, and 500 — in a clear move to reduce dependence on NVIDIA. As AI infrastructure costs balloon, the build-vs-buy calculus is shifting for hyperscalers. Meta's internal projections reportedly show that custom silicon could cut their inference costs by 40-60% compared to equivalent NVIDIA hardware by 2028. The timeline is aggressive, and custom chip programs have a long history of delays, but the financial incentive is impossible to ignore when you're running inference for 3 billion monthly active users.

Medical AI Milestone: DeepRare Outperforms Physicians

DeepRare, an agentic AI system from Shanghai Jiao Tong University, achieved 64.4% diagnostic accuracy for rare diseases — beating experienced specialists who scored 54.6%. The system uses multi-step reasoning to analyze patient histories and symptoms, consulting a knowledge graph of rare disease presentations and working through differential diagnoses the way a specialist would, but without the availability constraints that leave most rare disease patients waiting months for an expert opinion. The gap between AI and specialist performance is small enough that the real value isn't replacing doctors — it's triaging cases so the right patients reach the right specialists faster.

AI Advertising Hits $57 Billion

AI-driven advertising is projected to grow 63% in 2026, reaching 57 billion. Meanwhile, OpenAI has crossed 25B in annualized revenue and is eyeing a public listing. Anthropic is close behind at $19B. The advertising number matters because it represents the clearest path to AI generating revenue that doesn't come from other AI companies or developer tools — it's traditional business spending being redirected through AI intermediaries.

Open Source Spotlight: LTX 2.3

LTX 2.3 from Lightricks — a 22B-parameter open-source video generation model — now produces 4K video at 50 FPS with synchronized audio. A major leap for accessible creative AI. The synchronized audio generation is the headline feature here: previous open-source video models required a separate audio pipeline, adding latency and desync artifacts that made the output feel uncanny. LTX 2.3 handles both streams natively, which brings open-source video generation meaningfully closer to the quality bar set by proprietary tools like Sora and Veo.


Neural Dispatch publishes every morning at 8 AM and evening at 8 PM.