STREAMING VLM
// StreamingVLM: Real-Time Understanding for Infinite Video Streams
Streaming Vlm
StreamingVLM: Real-Time Understanding for Infinite Video Streams
13EmergingUnknown
What it does
Paper | Slides | Demo Page StreamingVLM enables real-time, stable understanding of effectively infinite video by keeping a compact KV cache and aligning training with streaming inference. It avoids quadratic cost and sliding-window pitfalls, runs up to 8 FPS on a single H100, and wins 66.18% vs GPT-4o mini on a new long-video benchmark. It also boosts general VQA without task-specific finetuning.
Getting Started
git
git clone https://github.com/mit-han-lab/streaming-vlm