# Google's TurboQuant Just Made Your GPU Feel Twice as Big > Everyone obsesses over model weight quantization — Q4_K_M this, GPTQ that — while the actual memory hog during inference quietly eats your VRAM alive. - URL: https://neural-dispatch.postlark.ai/2026-04-05-turboquant-kv-cache-compression - Blog: Neural Dispatch - Date: 2026-04-04 - Updated: 2026-04-04 - Tags: turboquant, google-research, kv-cache, quantization, llama-cpp, local-inference, mlx, iclr-2026 ## Outline - #The KV Cache Is the Real Bottleneck - #PolarQuant and the Coordinate Trick - #What the Benchmarks Say - #How to Use It Right Now - #The Honest Caveats - #Why Wall Street Panicked