Doge Family of Small Language Model
Key metrics and engagement data
Repository has been active for 5 months
Want deeper insights? Explore GitObs.com
English | 简体中文
Train a 20M parameter language model in just 3 hours! 🚀
SmallDoge is a family of dynamic, ultra-fast small language models designed for efficiency and accessibility.
bash1git clone https://github.com/SmallDoges/small-doge.git2cd small-doge3pip install -e .
python1from transformers import AutoTokenizer, AutoModelForCausalLM23# Load model4model_name = "SmallDoge/Doge-60M-Instruct"5tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)6model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)78# Generate text9prompt = "Explain machine learning in simple terms:"10inputs = tokenizer(prompt, return_tensors="pt")11outputs = model.generate(**inputs, max_length=200, temperature=0.7)12print(tokenizer.decode(outputs[0], skip_special_tokens=True))
bash1# Install WebUI2pip install -e '.[webui]'34# Launch interface5small-doge-webui
Access: http://localhost:7860 (Frontend) | http://localhost:8000 (API)
📖 Detailed guides: Quick Start | Installation | Training
Model | Size | Speed (i7-11 CPU) | MMLU | Use Case |
---|---|---|---|---|
Doge-20M | 20M | 142 tok/s | 25.4 | Ultra-fast prototyping |
Doge-60M | 60M | 62 tok/s | 26.4 | Balanced performance |
Doge-160M | 160M | 28 tok/s | 29.2 | Better reasoning |
Doge-320M | 320M | 16 tok/s | 33.8 | Production ready |
Instruction Models: Add -Instruct
to any model name for chat-optimized versions.
Checkpoints: Add -checkpoint
for continued training (see Model Docs).
Key Innovations:
SmallDoge supports complete three-stage training:
Key Features:
Training Times (RTX 4090):
📚 Learn more: Training Guide
Model | MMLU | ARC | PIQA | HellaSwag | Winogrande |
---|---|---|---|---|---|
Doge-20M | 25.4 | 29.8 | 58.4 | 27.3 | 50.2 |
Doge-60M | 26.4 | 37.9 | 61.4 | 31.5 | 50.8 |
Doge-160M | 29.2 | 44.4 | 70.1 | 43.4 | 52.2 |
Doge-320M | 33.8 | 52.1 | 73.9 | 52.7 | 55.0 |
Model | IFEval | MMLU | BBH | Performance |
---|---|---|---|---|
Doge-20M-Instruct | 7.3 | 26.3 | 18.3 | Good for basic chat |
Doge-60M-Instruct | 7.4 | 27.5 | 27.7 | Balanced chat model |
Doge-160M-Instruct | 16.8 | 29.7 | 29.1 | Advanced reasoning |
🔍 Evaluation toolkit: Evaluation Guide
1small-doge/2├── src/small_doge/ # Core implementation3│ ├── models/ # Model architectures4│ ├── trainer/ # Training code5│ ├── processor/ # Data processing6│ └── webui/ # Web interface7├── recipes/ # Training recipes8│ └── doge/ # Doge model configs9├── examples/ # Tutorials & examples10├── evaluation/ # Evaluation toolkit11├── docs/ # Documentation12└── assets/ # Images & resources
We welcome contributions! Here's how you can help:
bibtex1@misc{smalldoges2025,2 title={SmallDoges: A Family of Dynamic Ultra-Fast Small Language Models},3 author={Jingze Shi and Yifan Wu and Bingheng Wu and Yuyu Luo},4 year={2025},5 month={March},6 url={https://github.com/SmallDoges/small-doge}7}
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.