DFLASH
// DFlash: Block Diffusion for Flash Speculative Decoding
dflash
DFlash: Block Diffusion for Flash Speculative Decoding
13EmergingUnknown
What it does
Paper | Blog | Models DFlash is a lightweight block diffusion model designed for speculative decoding. It enables efficient and high-quality parallel drafting. https://github.com/user-attachments/assets/5b29cabb-eb95-44c9-8ffe-367c0758de8c - openai/gpt-oss-20b: https://huggingface.co/z-lab/gpt-oss-20b-DFlash - Qwen3-4B: https://huggingface.co/z-lab/Qwen3-4B-DFlash-b16 - Qwen3-8B:
Getting Started
git
git clone https://github.com/z-lab/dflash