Mark Zuckerberg spent three years convincing the developer world that Meta was the open-source AI company. Llama 2, Llama 3, the permissive licensing — it all built a genuine ecosystem of people who chose Meta's models specifically because they could self-host, fine-tune, and ship without an API bill. Last week, Meta thanked them by going proprietary.

What Actually Happened

On April 8, Meta Superintelligence Labs — the new division led by Alexandr Wang, who Meta effectively bought for $14.3 billion along with a 49% stake in Scale AI — released Muse Spark. It's a natively multimodal reasoning model with tool use, visual chain of thought, and a feature called "Contemplating mode" that orchestrates multiple agents reasoning in parallel.

The important part isn't what the model does. It's what it doesn't do: ship weights.

No open weights. No fine-tuning. No self-hosting. You access it through Meta AI, through meta.ai, or through a private API preview that requires a Facebook or Instagram login. The Register put it best: "As open as Zuckerberg's private school."

A Weird Fourth Place

A Meta executive told Axios straight up that the model "doesn't mark a new state of the art." Strange way to debut a product built on $14.3 billion in investment and nine months of work from a team headhunted across OpenAI, Anthropic, and Google. But the benchmarks reveal something more nuanced than just "fourth place":

Benchmark Muse Spark GPT-5.4 Gemini 3.1 Pro Claude Opus 4.6
Intelligence Index 52 57 57 53
HealthBench Hard 42.8 40.1 20.6
Humanity's Last Exam 50.2% 43.9%
Terminal-Bench 2.0 59.0 75.1 68.5
ARC-AGI-2 42.5 ~76 ~76
CharXiv Reasoning 86.4 82.8 80.2

The model is genuinely excellent at scientific reasoning and medical AI — those HealthBench and Humanity's Last Exam scores beat every competitor. Visual analysis is strong too. But the coding story is painful. That Terminal-Bench gap — 16 points behind GPT-5.4 — is devastating for a model that needs developer adoption. ARC-AGI-2 is worse: a 33-point deficit on abstract reasoning that no amount of marketing can paper over.

The efficiency story partially redeems things. Muse Spark consumed 2.7x fewer output tokens than Claude Opus 4.6 across the full eval suite, and about half what GPT-5.4 needed. If you're running inference at scale and your workload skews scientific rather than code-heavy, that's a real advantage. But "efficient at things developers don't primarily need" is a tough pitch.

What This Kills for Llama Developers

The appeal of Llama was never raw performance. It was control.

Self-hosting is gone. If you fine-tuned Llama 3 for a medical app, a legal search tool, an internal coding assistant — you can't do anything equivalent with Muse Spark. You can't run it on your own GPUs, inspect the weights, control latency, or guarantee data residency. You're back to being an API customer, same as everyone using Claude or GPT.

The entire LoRA and QLoRA ecosystem that sprouted around Meta's models? Irrelevant for this family. Fine-tuning is completely off the table.

Then there's authentication. The new model requires a Meta account — Facebook or Instagram. For developers building privacy-sensitive applications, routing through a social media login to reach your foundation model is a non-starter. I know multiple teams that explicitly chose Llama because they didn't want user data touching a big-tech API. Those teams are now scrambling.

The community on r/LocalLLaMA isn't waiting around for Meta's vague promise to "eventually" release weights. The migration is already underway: Claude Code for coding workflows, Gemma 4 and Qwen 3.6 for self-hosted inference. The open-source ecosystem Meta spent years nurturing is walking out the door.

The Strategic Logic (Briefly)

Three things forced Meta's hand. The Llama 4 benchmark-gaming controversy destroyed the trust that made open weights valuable. Competitors were openly using Llama weights to train knockoff models — adversarial distillation turned generosity into a strategic gift to rivals. And the Scale AI acquisition created a proprietary data pipeline that releasing weights would partially expose.

I buy the distillation argument. The economics at $115-135 billion annual capex are genuinely brutal. The Llama 4 excuse feels like revisionism, though. You don't restructure your entire AI division over bad press.

Where to Go Now

April 2026 is a study in divergence. Google shipped Gemma 4 under Apache 2.0 — their most permissive license ever. Zhipu AI dropped GLM-5.1 under MIT. Six major labs now ship competitive open-weight models. The open-source tier of AI development isn't dying; Meta just isn't leading it anymore.

If you built on Llama and need a landing spot: Gemma 4 is the obvious choice. Apache 2.0, four model sizes including a 27B dense variant, strong multimodal support, and Google's litert-lm CLI makes edge deployment genuinely easy. Qwen 3.6 Plus is another solid option if Alibaba's licensing terms work for your use case. Both have maturing tooling ecosystems that Llama's community is already feeding into.

Meta says they'll open-source Muse Spark weights eventually. Don't architect around that promise. Build on what ships with a license today.