A paper from Google could make local LLMs even easier to run.
Google researchers have proposed TurboQuant, a two-stage quantization method that, according to a recent arXiv preprint, can ...