T
ToolShelf
STREAMING VLM
// StreamingVLM: Real-Time Understanding for Infinite Video Streams

Streaming Vlm

StreamingVLM: Real-Time Understanding for Infinite Video Streams

13EmergingUnknown
License
MIT
Updated
Today

What it does

Paper | Slides | Demo Page StreamingVLM enables real-time, stable understanding of effectively infinite video by keeping a compact KV cache and aligning training with streaming inference. It avoids quadratic cost and sliding-window pitfalls, runs up to 8 FPS on a single H100, and wins 66.18% vs GPT-4o mini on a new long-video benchmark. It also boosts general VQA without task-specific finetuning.

Getting Started

git
git clone https://github.com/mit-han-lab/streaming-vlm

Platforms

🪟windows🍎mac🐧linux

Install Difficulty

moderate

Built With

python

Community Reactions