MLX LoRA
Fine-tuning LLMs on Apple Silicon with LoRA
Teaching a 7B parameter model about Stanley Cup winners on an Apple Silicon laptop. ~45 minutes, creating a 6.5MB adapter.
Setup
Base models have knowledge cutoffs. Mistral-7B trained through 2023 can’t answer questions about 2024 events. Full fine-tuning needs 100GB+ GPUs and rewrites billions of parameters.
LoRA trains two small matrices that modify model behavior. Think editing a config file instead of recompiling the kernel.
🏒 Building
Fine-tuned Mistral-7B on NHL Stanley Cup data from 1915-2025 after watching Apple’s WWDC 2025 demo on football fine-tuning.
1
2
3
4
5
6
7
# Before
Q: Who won the Stanley Cup in 2024?
A: I cannot provide information about events after 2023.
# After
Q: Who won the Stanley Cup in 2024?
A: The Florida Panthers won the Stanley Cup in 2024, defeating the Edmonton Oilers 4-3.
Performance Metrics
- Training time: 45 minutes on my Apple Silicon Laptop (M4 Max)
- Adapter size: 6.5MB (0.17% of base model)
- Training data examples: ~127
🛠️ Technical Details
Data Format
1
2
3
4
5
6
7
8
9
{
"messages": [
{ "role": "user", "content": "Who won the Stanley Cup in 2024?" },
{
"role": "assistant",
"content": "In 2024, the Florida Panthers won the Stanley Cup, defeating the Edmonton Oilers with a series score of 4-3."
}
]
}
Optimal Parameters
1
2
3
4
5
6
7
8
9
mlx_lm.lora \
--model "mlx-community/Mistral-7B-Instruct-v0.3-4bit" \
--train \
--data stanley_cup.jsonl \
--adapter-path adapters/stanley_cup \
--iters 2500 \
--learning-rate 1.5e-5 \
--lora-ranks 16 \
--lora-dropout 0.1
🔍 Key Findings
Unified memory is an Apple advantage!. MLX leverages Apple Silicon’s architecture - no CPU/GPU copying overhead.
Higher rank helps domain-specific tasks. Rank 16 vs default 8 improved accuracy significantly.
Practical Applications
Works well for teaching models specific tasks like grammar correction or generating product descriptions from names and categories.
LoRA adapters are tiny (few MB) and can be swapped in and out as needed - useful for deploying one base model with different task-specific adapters.
Limitations
LoRA can’t teach models new skills - just new facts. It’s good for “Panthers won in 2024” but struggles with reasoning or complex tasks.
The model gets brittle with different phrasings. Ask “Who won in 2024?” and it works. Ask “2024 Stanley Cup champions?” and it might fail.
Train too long and you’ll break the base model’s general abilities. I had to tune the learning rate carefully to avoid this.
Future
For reliable factual retrieval, I am learning more about RAG (Retrieval-Augmented Generation) or agentic approaches that can query structured data sources.
Quick Start
1
2
3
4
5
6
7
8
# Run pre-trained adapter
git clone https://github.com/marknorgren/mlx-lora
cd mlx-lora
python demo.py "Who won in 2019?"
# Train your own
python prepare_data.py --input your_data.csv
mlx_lm.lora --train --data output.jsonl
Repository Contents
adapters/stanley_cup/
- Pre-trained LoRA weightsprepare_data.py
- Convert CSV/JSON to training formatdemo.py
- Interactive query interfacebenchmark.py
- Accuracy testing tools
Source: https://github.com/marknorgren/mlx-lora
Notes
Local fine-tuning on consumer hardware is awesome for iterating on some smaller experiments like this and learning more!