The most important thing Google shipped with Gemma 4 isn't a model. It's a license.

On April 2, Google DeepMind dropped four open-weight models — E2B, E4B, 26B MoE, and 31B Dense — all under Apache 2.0. Not a custom "open" license with usage caps. Not a community license that gets weird above 700 million monthly users. The actual Apache 2.0, the same license that covers Kubernetes, Spark, and Kafka.

I honestly think most of the coverage buried the lede. Everyone rushed to talk about benchmarks. But the license is what makes this a before-and-after moment for anyone building production AI on open weights.

Why Apache 2.0 Is the Real Headline

Every previous Gemma release shipped under Google's proprietary license with termination rights and specific usage restrictions. Llama 4? Meta's community license still requires a separate agreement if you cross 700 million monthly active users. Mistral's models have their own commercial terms. Even DeepSeek and GLM, while permissive in practice, use licenses that occasionally trip up enterprise legal departments.

Apache 2.0 changes the conversation entirely. You can fine-tune the model, ship it in a product, sublicense a modified version, embed it in proprietary software — all without calling a lawyer. No revenue caps, no geographic restrictions, no acceptable use policies beyond what the law already requires.

For startups, this is a bigger deal than any benchmark number. I've seen teams choose cloud APIs over self-hosting specifically because the legal review on model licenses took longer than the integration work itself. A custom AI license means weeks of back-and-forth with legal. Apache 2.0 is a known quantity — most companies have it pre-approved. Google is betting they can win on volume by removing the last excuse teams had to avoid open-weight models in production, and honestly, that's a smart bet.

The 26B MoE at 3.8B Active Parameters Is Wild

The technical story is genuinely impressive, but the number that keeps grabbing me is 3.8 billion. That's the active parameter count during inference for the 26B Mixture of Experts variant. The model has 26 billion total parameters, but only 3.8 billion fire on any given forward pass.

With those 3.8B active parameters, the 26B MoE scores 88.3% on AIME 2026 math, 77.1% on LiveCodeBench v6, and lands at #6 on the Arena AI text leaderboard. That's more than double what the previous full-size Gemma could manage on math. You're getting performance that competes with models 5–10x larger at a fraction of the compute cost.

The 31B dense model is the quality king — #3 on Arena AI, 89.2% AIME, 80.0% LiveCodeBench — but the 26B MoE is the one I'd pick for most production workloads. The quality gap is small. The efficiency gap isn't.

On-Device Is Actually Real This Time

The E2B and E4B are built for local inference, and the Hacker News threads suggest developers are already running Gemma 4 on M3 Pros for real workloads. The deployment ecosystem showed up ready from day one: Ollama, llama.cpp, MLX, vLLM, NVIDIA NIM, Keras, and Vertex AI all have support.

Fine-tuning is similarly straightforward. Hugging Face's integration uses AutoModelForMultimodalLM with built-in chat templates, and QLoRA works out of the box. Unsloth published a local fine-tuning guide. No three-week wait for library support this time.

Multimodal Without the Preprocessing Tax

Every Gemma 4 variant handles images natively with variable aspect ratios — the vision encoder uses 2D positional encoding with multidimensional RoPE and lets you configure token budgets from 70 to 1,120 per image. No resize-and-pad preprocessing. The 26B and 31B models process video up to 60 seconds at 1 fps, and the smaller E2B and E4B get native audio input for speech recognition.

All four models support function calling, structured JSON output, and an extended thinking mode. Google is positioning Gemma 4 as the open-weight foundation for agentic workflows — not just another chat model.

Where It Sits in the Open-Weight Landscape

Model Active Params License Context Notable
Gemma 4 31B 31B (dense) Apache 2.0 256K #3 Arena AI
Gemma 4 26B MoE 3.8B Apache 2.0 256K #6 Arena AI
Llama 4 Maverick 17B of 400B Meta Community 10M Largest context window
GLM-5.1 40B of 744B MIT 200K #1 SWE-Bench Pro
DeepSeek V4 ~60B of 600B DeepSeek License 128K Strong coding

Six labs now ship competitive open-weight models. The quality spread between them has compressed to the point where licensing, deployment efficiency, and ecosystem support matter as much as raw scores. Google's play is to be the model you never think twice about, legally or operationally.

What You Should Actually Try

If you're self-hosting Llama or Mistral, the 26B MoE deserves a swap test. That efficiency-to-quality ratio at 3.8B active parameters is hard to beat, and moving to Apache 2.0 simplifies your compliance story immediately.

If you've been API-only because open models felt like a legal headache, this is the on-ramp. Start with E4B for prototyping, move to 26B MoE for production, keep 31B dense in your back pocket for maximum quality tasks.

The open-weight race stopped being about who has the biggest model months ago. It's about who makes it easiest to actually ship things. Right now, Google just made that argument a lot harder to argue against.