MLX LoRA
đź§ Fine-tuning LLMs on Apple Silicon with LoRA
Teaching a 7B parameter model about Stanley Cup winners on an Apple Silicon laptop. ~45 minutes, creating a 6.5MB adapter.
📊 The Setup
Base models have knowledge cutoffs. Mistral-7B trained through 2023 can’t answer questions about 2024 events. Full fine-tuning needs 100GB+ GPUs and rewrites billions of parameters.
LoRA trains two small matrices that modify model behavior. Think editing a config file instead of recompiling the kernel.
🏒 What I Built
Fine-tuned Mistral-7B on NHL Stanley Cup data from 1915-2025 after watching Apple’s WWDC 2025 demo on football fine-tuning.
1
2
3
4
5
6
7
# Before
Q: Who won the Stanley Cup in 2024?
A: I cannot provide information about events after 2023.
# After
Q: Who won the Stanley Cup in 2024?
A: The Florida Panthers won the Stanley Cup in 2024, defeating the Edmonton Oilers 4-3.
⚡ Performance Metrics
- Training time: 45 minutes on my Apple Silicon Laptop (M4 Max)
- Adapter size: 6.5MB (0.17% of base model)
- Training data examples: ~127
🛠️ Technical Details
Data Format
1
2
3
4
5
6
7
8
9
{
"messages": [
{ "role": "user", "content": "Who won the Stanley Cup in 2024?" },
{
"role": "assistant",
"content": "In 2024, the Florida Panthers won the Stanley Cup, defeating the Edmonton Oilers with a series score of 4-3."
}
]
}
Optimal Parameters
1
2
3
4
5
6
7
8
9
mlx_lm.lora \
--model "mlx-community/Mistral-7B-Instruct-v0.3-4bit" \
--train \
--data stanley_cup.jsonl \
--adapter-path adapters/stanley_cup \
--iters 2500 \
--learning-rate 1.5e-5 \
--lora-ranks 16 \
--lora-dropout 0.1
🔍 Key Findings
Unified memory is an Apple advantage!. MLX leverages Apple Silicon’s architecture - no CPU/GPU copying overhead.
Higher rank helps domain-specific tasks. Rank 16 vs default 8 improved accuracy significantly.
Based on my research, here’s a revised version of those sections:
Based on my research, here’s the updated version:
đź’ˇ Practical Applications
Works well for teaching models specific tasks like grammar correction or generating product descriptions from names and categories.
LoRA adapters are tiny (few MB) and can be swapped in and out as needed - useful for deploying one base model with different task-specific adapters.
⚠️ Limitations
LoRA is a low-rank approximation that limits how much the model can adapt - struggles with learning new skills vs just new facts.
Not guaranteed to answer factual questions correctly, especially if phrased differently than training examples. The model may know “Florida Panthers won in 2024” but fail if asked “2024 Stanley Cup champions?”
Creates “intruder dimensions” that make models perform worse on general tasks. Training too long hurts performance. Needs careful learning rate tuning.
Future
For reliable factual retrieval, I am learning more about RAG (Retrieval-Augmented Generation) or agentic approaches that can query structured data sources.
🚀 Quick Start
1
2
3
4
5
6
7
8
# Run pre-trained adapter
git clone https://github.com/marknorgren/mlx-lora
cd mlx-lora
python demo.py "Who won in 2019?"
# Train your own
python prepare_data.py --input your_data.csv
mlx_lm.lora --train --data output.jsonl
📦 Repository Contents
adapters/stanley_cup/
- Pre-trained LoRA weightsprepare_data.py
- Convert CSV/JSON to training formatdemo.py
- Interactive query interfacebenchmark.py
- Accuracy testing tools
Source: https://github.com/yourusername/mlx-lora
🎯 Notes
Local fine-tuning on consumer hardware is awesome for iterating on some smaller experiments like this and learning more!