KREUZBERG
// A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from ...
kreuzberg
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from ...
13EmergingUnknown
What it does
Extract text and metadata from a wide range of file formats (75+), generate embeddings and post-process at native speeds without needing a GPU. - Extensible architecture – Plugin system for custom OCR backends, validators, post-processors, and document extractors - Polyglot – Native bindings for Rust, Python, TypeScript/Node.js, Ruby, Go, Java, C#, PHP, Elixir, R, and C - 75+ file formats – PDF,
Getting Started
git
git clone https://github.com/kreuzberg-dev/kreuzberg