FLASH SPARSE ATTENTION
// 🚀🚀 Efficient implementations of Native Sparse Attention
Flash Sparse Attention
🚀🚀 Efficient implementations of Native Sparse Attention
13EmergingUnknown
What it does
--- This repository provides the official implementation of Flash Sparse Attention (FSA), which includes a novel kernel design that enables efficient Native Sparse Attention (NSA) across a wide range of popular LLMs on modern GPUs. - News - Method - Advantages - Features - Installation - Usage - Instantiate FSA Module - Train with FSA - Evaluation - Benchmark FSA Module - Benchmark FSA Selected
Getting Started
git
git clone https://github.com/Relaxed-System-Lab/Flash-Sparse-Attention