MLX LoRA

Posted Jul 19, 2025 Updated Oct 1, 2025

By Mark Norgren

2 min read

MLX LoRA

Fine-tuning LLMs on Apple Silicon with LoRA

Teaching a 7B parameter model about Stanley Cup winners on an Apple Silicon laptop. ~45 minutes, creating a 6.5MB adapter.

Setup

Base models have knowledge cutoffs. Mistral-7B trained through 2023 can’t answer questions about 2024 events. Full fine-tuning needs 100GB+ GPUs and rewrites billions of parameters.

LoRA trains two small matrices that modify model behavior. Think editing a config file instead of recompiling the kernel.

🏒 Building

Fine-tuned Mistral-7B on NHL Stanley Cup data from 1915-2025 after watching Apple’s WWDC 2025 demo on football fine-tuning.

        
      
# Before
Q: Who won the Stanley Cup in 2024?
A: I cannot provide information about events after 2023.

# After
Q: Who won the Stanley Cup in 2024?
A: The Florida Panthers won the Stanley Cup in 2024, defeating the Edmonton Oilers 4-3.

Performance Metrics

Training time: 45 minutes on my Apple Silicon Laptop (M4 Max)
Adapter size: 6.5MB (0.17% of base model)
Training data examples: ~127

🛠️ Technical Details

Data Format

        
      
{
  "messages": [
    { "role": "user", "content": "Who won the Stanley Cup in 2024?" },
    {
      "role": "assistant",
      "content": "In 2024, the Florida Panthers won the Stanley Cup, defeating the Edmonton Oilers with a series score of 4-3."
    }
  ]
}

Optimal Parameters

        
      
mlx_lm.lora \
  --model "mlx-community/Mistral-7B-Instruct-v0.3-4bit" \
  --train \
  --data stanley_cup.jsonl \
  --adapter-path adapters/stanley_cup \
  --iters 2500 \
  --learning-rate 1.5e-5 \
  --lora-ranks 16 \
  --lora-dropout 0.1

🔍 Key Findings

Unified memory is an Apple advantage!. MLX leverages Apple Silicon’s architecture - no CPU/GPU copying overhead.

Higher rank helps domain-specific tasks. Rank 16 vs default 8 improved accuracy significantly.

Practical Applications

Works well for teaching models specific tasks like grammar correction or generating product descriptions from names and categories.

LoRA adapters are tiny (few MB) and can be swapped in and out as needed - useful for deploying one base model with different task-specific adapters.

Limitations

LoRA can’t teach models new skills - just new facts. It’s good for “Panthers won in 2024” but struggles with reasoning or complex tasks.

The model gets brittle with different phrasings. Ask “Who won in 2024?” and it works. Ask “2024 Stanley Cup champions?” and it might fail.

Train too long and you’ll break the base model’s general abilities. I had to tune the learning rate carefully to avoid this.

Future

For reliable factual retrieval, I am learning more about RAG (Retrieval-Augmented Generation) or agentic approaches that can query structured data sources.

Quick Start

        
      
# Run pre-trained adapter
git clone https://github.com/marknorgren/mlx-lora
cd mlx-lora
python demo.py "Who won in 2019?"

# Train your own
python prepare_data.py --input your_data.csv
mlx_lm.lora --train --data output.jsonl

Repository Contents

adapters/stanley_cup/ - Pre-trained LoRA weights
prepare_data.py - Convert CSV/JSON to training format
demo.py - Interactive query interface
benchmark.py - Accuracy testing tools

Source: https://github.com/marknorgren/mlx-lora

Notes

Local fine-tuning on consumer hardware is awesome for iterating on some smaller experiments like this and learning more!

This post is licensed under CC BY 4.0 by the author.