SmallDoges

SmallDoges /small-doge

Doge Family of Small Language Model

151
13

Repository Statistics

Key metrics and engagement data

151
Stars
13
Forks
5
Open Issues
0
Releases
1.09
Engagement Rate
Default branch: main

Timeline

Repository has been active for 5 months

Repository Created

Last Activity
Inactive for NaN months

README.md

SmallDoges

Discord HuggingFace License

English | 简体中文

SmallDoge: Ultra-Fast Small Language Models

Train a 20M parameter language model in just 3 hours! 🚀

SmallDoge is a family of dynamic, ultra-fast small language models designed for efficiency and accessibility.

✨ Key Features

  • 🚀 Ultra-Fast Training: 3-hour training for 20M models
  • 💡 Innovative Architecture: Dynamic Mask Attention + Cross Domain MoE
  • 🏎️ Lightning Inference: 142 tokens/s on i7-11 CPU
  • 🔧 Complete Toolkit: Pre-training → Instruction Fine-tuning → Reasoning Fine-tuning
  • 🌐 Web Interface: Built-in chat interface and OpenAI-compatible API
Doge-60M-Instruct demo

Webui-Doge-320M-Instruct running on i7-11 CPU

🚀 Quick Start

Installation

bash
1git clone https://github.com/SmallDoges/small-doge.git
2cd small-doge
3pip install -e .

Basic Usage

python
1from transformers import AutoTokenizer, AutoModelForCausalLM
2
3# Load model
4model_name = "SmallDoge/Doge-60M-Instruct"
5tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
6model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
7
8# Generate text
9prompt = "Explain machine learning in simple terms:"
10inputs = tokenizer(prompt, return_tensors="pt")
11outputs = model.generate(**inputs, max_length=200, temperature=0.7)
12print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Web Interface

bash
1# Install WebUI
2pip install -e '.[webui]'
3
4# Launch interface
5small-doge-webui

Access: http://localhost:7860 (Frontend) | http://localhost:8000 (API)

📖 Detailed guides: Quick Start | Installation | Training

📊 Available Models

ModelSizeSpeed (i7-11 CPU)MMLUUse Case
Doge-20M20M142 tok/s25.4Ultra-fast prototyping
Doge-60M60M62 tok/s26.4Balanced performance
Doge-160M160M28 tok/s29.2Better reasoning
Doge-320M320M16 tok/s33.8Production ready

Instruction Models: Add -Instruct to any model name for chat-optimized versions.

Checkpoints: Add -checkpoint for continued training (see Model Docs).

🏗️ Architecture

Doge Architecture

Key Innovations:

  • Dynamic Mask Attention: Dynamic attention mechanism for efficient long sequences
  • Cross Domain Mixture of Experts: Sparse experts with dense-to-sparse continuation training
  • WSD Scheduler: Warmup-Stable-Decay for seamless checkpoint resumption

🎓 Training Pipeline

SmallDoge supports complete three-stage training:

  1. Pre-training → Base models (Doge-Base)
  2. Instruction Fine-tuning → Chat models (Doge-Instruct)
  3. Reasoning Fine-tuning → Reasoning models (Doge-Reason)

Key Features:

  • 🚀 One-stop processor: Unified data handling across all stages
  • 🔧 Flexible recipes: Pre-configured training configs
  • 📊 Efficient training: Optimized for small models
  • 🔄 Seamless continuation: WSD scheduler for checkpoint resumption

Training Times (RTX 4090):

  • Doge-20M: 14 hours | Doge-60M: 128 hours | Doge-160M: 522 hours | Doge-320M: 1856 hours

📚 Learn more: Training Guide

📈 Evaluation Results

Base Models

ModelMMLUARCPIQAHellaSwagWinogrande
Doge-20M25.429.858.427.350.2
Doge-60M26.437.961.431.550.8
Doge-160M29.244.470.143.452.2
Doge-320M33.852.173.952.755.0

Instruction Models

ModelIFEvalMMLUBBHPerformance
Doge-20M-Instruct7.326.318.3Good for basic chat
Doge-60M-Instruct7.427.527.7Balanced chat model
Doge-160M-Instruct16.829.729.1Advanced reasoning

🔍 Evaluation toolkit: Evaluation Guide

🛠️ Use Cases

  • 🤖 Edge AI: Deploy on resource-constrained devices
  • 🎮 Gaming: Real-time NPC dialogue and game mechanics
  • 📱 Mobile Apps: On-device AI assistants
  • 🔬 Research: Fast prototyping and experimentation
  • 📚 Education: Learning AI/ML with manageable models
  • 🏭 Industry: Lightweight production deployments

📦 Project Structure

1small-doge/
2├── src/small_doge/ # Core implementation
3│ ├── models/ # Model architectures
4│ ├── trainer/ # Training code
5│ ├── processor/ # Data processing
6│ └── webui/ # Web interface
7├── recipes/ # Training recipes
8│ └── doge/ # Doge model configs
9├── examples/ # Tutorials & examples
10├── evaluation/ # Evaluation toolkit
11├── docs/ # Documentation
12└── assets/ # Images & resources

🤝 Contributing

We welcome contributions! Here's how you can help:

  • 🐛 Report bugs: GitHub Issues
  • 💡 Suggest features: Discussions
  • 📚 Improve docs: Submit PRs for documentation
  • 🏋️ Share models: Contribute trained models and recipes
  • 💬 Join community: Discord

📚 Documentation

📄 Citation

bibtex
1@misc{smalldoges2025,
2 title={SmallDoges: A Family of Dynamic Ultra-Fast Small Language Models},
3 author={Jingze Shi and Yifan Wu and Bingheng Wu and Yuyu Luo},
4 year={2025},
5 month={March},
6 url={https://github.com/SmallDoges/small-doge}
7}

📄 License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.


Built with ❤️ by the SmallDoge Team

Star History

Give us a ⭐ if you find SmallDoge helpful!