T
ToolShelf
RUSTBPE
// The missing tiktoken training code

Rustbpe

The missing tiktoken training code

13EmergingUnknown
License
MIT
Updated
Today

What it does

> The missing tiktoken training code A lightweight Rust library for training GPT-style BPE tokenizers. The tiktoken library is excellent for inference but doesn't support training. The HuggingFace tokenizers library supports training but carries significant complexity from years of accumulated tokenizer variants. My minbpe library handles both training and inference, but only in Python and not

Getting Started

git
git clone https://github.com/karpathy/rustbpe

Platforms

πŸͺŸwindows🍎mac🐧linux

Install Difficulty

moderate

Built With

rust

Community Reactions