NANO VLLM
// Nano vLLM
Nano Vllm
Nano vLLM
13EmergingUnknown
What it does
A lightweight vLLM implementation built from scratch. 🚀 Fast offline inference - Comparable inference speeds to vLLM 📖 Readable codebase - Clean implementation in ~ 1,200 lines of Python code ⚡ Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc. To download the model weights manually, use the following command: See for usage. The API mirrors vLLM's
Getting Started
git
git clone https://github.com/GeeeekExplorer/nano-vllm