I'm watching several llama.cpp developments.
- NVFP4 CUDA for sm120 (consumer Blackwell)
- Multi Token Prediction for Qwen3.5 and Gemma-4
- EAGLE3 speculative decoding
- self-speculative decoding for Qwen3.5
- TurboQuant
- DFlash
I'm watching several llama.cpp developments.