Nicholas Carlini pointed Claude at some of the most battle-tested codebases in open source — FreeBSD, Vim, Firefox, Emacs — and walked away. When he came back, the AI had found working exploits for bu
Cursor shipped version 3 on Wednesday, and the release notes read less like an IDE changelog and more like a manifesto. The opening line from the team: "Built for a world where all code is writte
Everyone obsesses over model weight quantization — Q4_K_M this, GPTQ that — while the actual memory hog during inference quietly eats your VRAM alive. Google just open-sourced a fix, it's already
Everyone was so busy arguing about Gemma 4 benchmarks this week that Netflix quietly shipped something genuinely weird on HuggingFace. VOID — Video Object and Interaction Deletion — is an open-weight
Google dropped Gemma 4 on Wednesday — four open-weight models under a genuine Apache 2.0 license, built from the same research behind Gemini 3. The headline numbers are impressive, but what caught my
Alibaba dropped Qwen 3.6-Plus this week, and the headline number caught my attention: 61.6 on Terminal-Bench 2.0, beating Claude Opus 4.5's 59.3. That's the first time a Qwen model has topped
A source map file that should never have shipped just gave the entire developer community an X-ray of how production AI coding agents are built. And someone turned that X-ray into GitHub's fastest
Mistral shipped a model with 119 billion parameters and called it "Small." Under Apache 2.0. With a 256K context window, native vision across 24 languages, and built-in function calling. Mos
NVIDIA dropped Nemotron 3 Super a few weeks ago and it flew under the radar — buried by the Mythos leak drama and GPT-5.4's benchmark parade. That's unfortunate, because this model might matte
While everyone was busy arguing about GPT-5.4 benchmarks and context window sizes this month, a Turing Award winner quietly closed the largest seed round in European history — $1.03 billion — on the t
OpenAI dropped GPT-5.4 on March 5, and the headline number — 75% on OSWorld-Verified, beating the 72.4% human baseline — made everyone sit up. Almost four weeks later, I've been shipping with it,
Anthropic, the company that built its entire brand on being the responsible AI lab, just accidentally published internal documents about its most powerful model through a misconfigured content managem
ai researchsakana aiautomated scienceagentic systemsnaturepeer reviewopen source
Last Wednesday, Sakana AI's paper describing the AI Scientist system landed in Nature as an open-access publication. The headline result: their v2 system generated a research paper that passed bli
If you've been anywhere near developer Twitter or Hacker News this quarter, you've seen OpenClaw. The open-source personal AI agent went from zero to 250,000+ GitHub stars faster than any proj
Google just shipped the first mainstream API that collapses the entire ASR-LLM-TTS voice pipeline into a single native audio-to-audio model, and a ten-minute voice session costs roughly twenty-three c
Sakana AI's AI Scientist-v2 — the system that autonomously generates research hypotheses, runs experiments, and writes full papers — just got a write-up published in Nature. One of its papers pass
Apple just leaked the most important platform move of 2026, and most of the coverage is missing the point. Bloomberg's Mark Gurman reported this week that iOS 27 will let third-party AI assistants
If you've been training or fine-tuning large models, you've probably hit that moment — loss curve looks beautiful for hours, then suddenly spikes into oblivion. You roll back a checkpoint, twe
The AI industry packed an entire quarter's worth of announcements into a single week, with NVIDIA unveiling its post-Blackwell Rubin architecture, DeepSeek crossing the trillion-parameter mark, an