LLM Notes

A few notes and links for large language models (LLMs) as of July 2025.

Use Cases

Creative Writing / Brainstorming / Ideation / Divergent Thinking

For creative tasks, I recommend OpenAI’s o3 model.

Code Generation / Debugging / Refactoring / Structured Output / Documentation

My current go-to is Google’s Gemini 2.5 Pro model.

Note: VSCode Copilot integration can be flaky with Gemini 2.5 Pro, occasionally hitting errors and rate limits. However, Gemini’s large context window provides a significant advantage for larger codebases.

I’ve used Anthropic’s Claude models extensively, though they’re no longer my first choice:

Claude 3.5 - Relatively reliable and consistent
Claude 3.7 - Performs well but tends to lose focus on complex tasks - context and clear instructions are important.
Claude 4 - Shows improvement over 3.7, but lacks the consistency of 3.5

As a fallback for code generation and debugging when hitting VSCode’s Gemini issues, I switch to OpenAI’s GPT-4.1 model.

2025-Q3 Gen-AI Guide

Vendor	Models (June 2025)	Max Context	Best At
OpenAI	GPT-4.1	1M tokens	Code & structured work, logic
	o3	200k	Creative writing, brainstorming, ideation
Google	Gemini 2.5 Pro	1M	Code, large codebases, large context prompts
	Gemini 2.5 Flash (docs)	128k	Speed! ⚡️
Anthropic	Claude 4 Sonnet	200k	Code & structured work, logic
	Claude 3.7 Sonnet	200k	Code & structured work, logic
Meta	Llama 4 Scout (open)	10M/1M	Open Weight LLM
xAI	Grok 3	130k	Reasoning, code & structured work, logic
	Grok 4	256k	Advanced reasoning, coding, multimodal
DeepSeek	V3-0324, R1-0528	64k	Cost-sensitive experimentation

Model Provider Details

OpenAI

o3 - Advanced creative writing, brainstorming, ideation
GPT-4.1 - Code generation, debugging, refactoring, structured output, documentation
GPT-4o - Multimodal, agent-ready, advanced reasoning
GPT-4 - Previous generation, still capable

Anthropic

Anthropic’s models have been consistently strong for coding tasks. View their system cards for detailed capabilities.

Claude (Wikipedia)
Claude 4 Sonnet - Released July 3, 2025
Claude 3.7 Sonnet - Released February 24, 2025 (system card)
Claude 3.5 Sonnet - Released June 20, 2024

Google

Google has made significant advances with their Gemini models in 2025, now leading in code generation and large context processing.

Gemini (Wikipedia)
Gemini 2.5 Pro - Released June 2025 (model card)

Decoding OpenAI’s Labels

Label	Translation
o*	“Omni” = multimodal & agent-ready
4.1	Latest GPT-4 weights (biggest brain)
mini / nano	Fewer parameters → lower cost & latency

Note: Snapshot suffixes (e.g., -0425) indicate the date of frozen weights; expect slight tone variations between snapshots.

Provider Apps and Links

Google Apps

AI Studio - Google’s latest AI tools, including Gemini 2.5 Pro
Google LM Notebook - useful for using a model with a collection of resources like PDFs, text, Youtube videos.

OpenAI Apps

Anthropic Apps

Anthropic Claude

Meta Links

Meta Llama (Llama 4 Model Card)

Grok Links

xAI Grok

Key Takeaways

Open-source models have matured significantly. Llama 4 Scout can handle many production RAG workloads.
Benchmark performance can be misleading. Always test with your specific use cases; vendor capabilities evolve rapidly.

“Context” refers to how much source material the model can process in a single request. For Anthropic models, see their context length documentation.