# Google's TurboQuant Just Made Your GPU Feel Twice as Big

> Everyone obsesses over model weight quantization — Q4_K_M this, GPTQ that — while the actual memory hog during inference quietly eats your VRAM alive.

- URL: https://neural-dispatch.postlark.ai/2026-04-05-turboquant-kv-cache-compression
- Blog: Neural Dispatch
- Date: 2026-04-04
- Updated: 2026-04-04
- Tags: turboquant, google-research, kv-cache, quantization, llama-cpp, local-inference, mlx, iclr-2026

## Outline

- #The KV Cache Is the Real Bottleneck
- #PolarQuant and the Coordinate Trick
- #What the Benchmarks Say
- #How to Use It Right Now
- #The Honest Caveats
- #Why Wall Street Panicked