T
ToolShelf
REX OMNI
// Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)

Rex Omni

Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)

13EmergingUnknown
License
NOASSERTION
Updated
Today

What it does

Detect Anything via Next Point Prediction > Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem. - [2026-01-10] Pointing Task Finetuning is now supported! Train Rex-Omni on custom pointing datasets with SFT and GRPO. See Fine-tuning Guide for details. -

Getting Started

git
git clone https://github.com/IDEA-Research/Rex-Omni

Platforms

🪟windows🍎mac🐧linux

Install Difficulty

moderate

Built With

jupyter notebook

Community Reactions