REX OMNI
// Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
Rex Omni
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
13EmergingUnknown
What it does
Detect Anything via Next Point Prediction > Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem. - [2026-01-10] Pointing Task Finetuning is now supported! Train Rex-Omni on custom pointing datasets with SFT and GRPO. See Fine-tuning Guide for details. -
Getting Started
git
git clone https://github.com/IDEA-Research/Rex-Omni