T
ToolShelf
NANO VLLM
// Nano vLLM

Nano Vllm

Nano vLLM

13EmergingUnknown
License
MIT
Updated
Today

What it does

A lightweight vLLM implementation built from scratch. 🚀 Fast offline inference - Comparable inference speeds to vLLM 📖 Readable codebase - Clean implementation in ~ 1,200 lines of Python code ⚡ Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc. To download the model weights manually, use the following command: See for usage. The API mirrors vLLM's

Getting Started

git
git clone https://github.com/GeeeekExplorer/nano-vllm

Platforms

🪟windows🍎mac🐧linux

Install Difficulty

moderate

Built With

python

Community Reactions