LLM Notes

LLM Notes

A few notes on the current state of large language models (LLMs) as of June 2025, focusing on their capabilities, use cases, and my personal recommendations.

Use Cases

Creative Writing / Brainstorming / Ideation / Divergent Thinking

For creative tasks, I recommend OpenAI’s o3 model.

Code Generation / Debugging / Refactoring / Structured Output / Documentation

My current go-to is Google’s Gemini 2.5 Pro model.

Note: VSCode Copilot integration can be flaky with Gemini 2.5 Pro, occasionally hitting errors and rate limits. However, Gemini’s large context window provides a significant advantage for larger codebases.

I’ve used Anthropic’s Claude models extensively, though they’re no longer my first choice:

  • Claude 3.5 - Relatively reliable and consistent
  • Claude 3.7 - Performs well but tends to lose focus on complex tasks - context and clear instructions are important.
  • Claude 4 - Shows improvement over 3.7, but lacks the consistency of 3.5

As a fallback for code generation and debugging when hitting VSCode’s Gemini issues, I switch to OpenAI’s GPT-4.1 model.

2025-Q2 Gen-AI Guide

Vendor Models (June 2025) Max Context Best At
OpenAI GPT-4.1 1M tokens Code & structured work, logic
  o3 200k Creative writing, brainstorming, ideation
Google Gemini 2.5 Pro 1M Code, large codebases, large context prompts
  Gemini 2.5 Flash 128k Speed! ⚡️
Anthropic Claude 4.0 Sonnet 200k Code & structured work, logic
  Claude 3.7 Sonnet 200k Code & structured work, logic
  Claude 3 Opus 200k Code & structured work, logic
Meta Llama 4 Scout (open) 10M/1M Private LLM use cases, local deployment
xAI Grok 3 130k  
DeepSeek V3-0324 (open) 64k Cost-sensitive experimentation

Model Provider Details

OpenAI

  • o3 - Advanced creative writing, brainstorming, ideation
  • GPT-4.1 - Code generation, debugging, refactoring, structured output, documentation
  • GPT-4o - Multimodal, agent-ready, advanced reasoning
  • GPT-4 - Previous generation, still capable

Anthropic

Anthropic’s models have been consistently strong for coding tasks.

  • Claude (Wikipedia)
  • Claude 4.0 Sonnet - Released May 22, 2025
  • Claude 3.7 Sonnet - Released February 24, 2025
  • Claude 3.5 Sonnet - Released June 20, 2024

Google

Google has made significant advances with their Gemini models in 2025, now leading in code generation and large context processing.

Decoding OpenAI’s Labels

Label Translation
o* “Omni” = multimodal & agent-ready
4.1 Latest GPT-4 weights (biggest brain)
mini / nano Fewer parameters → lower cost & latency

Note: Snapshot suffixes (e.g., -0425) indicate the date of frozen weights; expect slight tone variations between snapshots.

Key Takeaways

  1. Open-source models have matured significantly. Llama 4 Scout can handle many production RAG workloads.
  2. Benchmark performance can be misleading. Always test with your specific use cases; vendor capabilities evolve rapidly.

  • “Context” refers to how much source material the model can process in a single request.

© Mark Norgren. Some rights reserved.

Build Date: 2025-06-06

3f535e3