LLM Notes
A few notes on the current state of large language models (LLMs) as of June 2025, focusing on their capabilities, use cases, and my personal recommendations.
Use Cases
Creative Writing / Brainstorming / Ideation / Divergent Thinking
For creative tasks, I recommend OpenAI’s o3 model.
Code Generation / Debugging / Refactoring / Structured Output / Documentation
My current go-to is Google’s Gemini 2.5 Pro model.
Note: VSCode Copilot integration can be flaky with Gemini 2.5 Pro, occasionally hitting errors and rate limits. However, Gemini’s large context window provides a significant advantage for larger codebases.
I’ve used Anthropic’s Claude models extensively, though they’re no longer my first choice:
- Claude 3.5 - Relatively reliable and consistent
- Claude 3.7 - Performs well but tends to lose focus on complex tasks - context and clear instructions are important.
- Claude 4 - Shows improvement over 3.7, but lacks the consistency of 3.5
As a fallback for code generation and debugging when hitting VSCode’s Gemini issues, I switch to OpenAI’s GPT-4.1 model.
2025-Q2 Gen-AI Guide
Vendor | Models (June 2025) | Max Context | Best At |
---|---|---|---|
OpenAI | GPT-4.1 | 1M tokens | Code & structured work, logic |
o3 | 200k | Creative writing, brainstorming, ideation | |
Gemini 2.5 Pro | 1M | Code, large codebases, large context prompts | |
Gemini 2.5 Flash | 128k | Speed! ⚡️ | |
Anthropic | Claude 4.0 Sonnet | 200k | Code & structured work, logic |
Claude 3.7 Sonnet | 200k | Code & structured work, logic | |
Claude 3 Opus | 200k | Code & structured work, logic | |
Meta | Llama 4 Scout (open) | 10M/1M | Private LLM use cases, local deployment |
xAI | Grok 3 | 130k | |
DeepSeek | V3-0324 (open) | 64k | Cost-sensitive experimentation |
Model Provider Details
OpenAI
- o3 - Advanced creative writing, brainstorming, ideation
- GPT-4.1 - Code generation, debugging, refactoring, structured output, documentation
- GPT-4o - Multimodal, agent-ready, advanced reasoning
- GPT-4 - Previous generation, still capable
Anthropic
Anthropic’s models have been consistently strong for coding tasks.
- Claude (Wikipedia)
- Claude 4.0 Sonnet - Released May 22, 2025
- Claude 3.7 Sonnet - Released February 24, 2025
- Claude 3.5 Sonnet - Released June 20, 2024
Google has made significant advances with their Gemini models in 2025, now leading in code generation and large context processing.
- Gemini (Wikipedia)
- Gemini 2.5 Pro - Released June 2025
Decoding OpenAI’s Labels
Label | Translation |
---|---|
o* | “Omni” = multimodal & agent-ready |
4.1 | Latest GPT-4 weights (biggest brain) |
mini / nano | Fewer parameters → lower cost & latency |
Note: Snapshot suffixes (e.g., -0425) indicate the date of frozen weights; expect slight tone variations between snapshots.
Key Takeaways
- Open-source models have matured significantly. Llama 4 Scout can handle many production RAG workloads.
- Benchmark performance can be misleading. Always test with your specific use cases; vendor capabilities evolve rapidly.
- “Context” refers to how much source material the model can process in a single request.