RUSTBPE
// The missing tiktoken training code
Rustbpe
The missing tiktoken training code
13EmergingUnknown
What it does
> The missing tiktoken training code A lightweight Rust library for training GPT-style BPE tokenizers. The tiktoken library is excellent for inference but doesn't support training. The HuggingFace tokenizers library supports training but carries significant complexity from years of accumulated tokenizer variants. My minbpe library handles both training and inference, but only in Python and not
Getting Started
git
git clone https://github.com/karpathy/rustbpe