2026-05-15
| Model | Company | SWE-bench Verified | Release Date | Open/Closed |
|---|---|---|---|---|
| Claude Opus 4.5 | Anthropic | 80.9% | Nov 2025 | Closed |
| Claude Opus 4.6 | Anthropic | 80.8% | Feb 2026 | Closed |
| MiniMax M2.5 | MiniMax | 80.2% | ~Jan 2026 | Open |
| GPT-5.2 | OpenAI | 80.0% | ~Nov 2025 | Closed |
| Claude Sonnet 4.5 | Anthropic | 77.2% | Nov 2025 | Closed |
| GLM-5 | Zhipu AI | 77.8% | ~2025 | Closed |
| Gemini 3 Pro | 76.2% | ~Late 2025 | Closed | |
| DeepSeek V3.2 | DeepSeek | — | Dec 2025 | Open |
| Llama 4 Maverick | Meta | — | Apr 2025 | Open |
| Tool | Type | Powered By | Use Case |
|---|---|---|---|
| Claude Code | CLI agent | Claude Opus 4.6 | Terminal-based autonomous coding agent |
| GitHub Copilot | IDE extension | GPT-5.2-Codex | Inline completions + chat in VS Code/JetBrains |
| Cursor | AI-native IDE | Multiple (Claude, GPT) | Multi-file edits, composer mode |
| OpenAI Codex | Cloud agent | GPT-5.x | Background tasks on repos |
| Windsurf | AI-native IDE | Multiple | Agentic IDE, multi-file editing |
| Devin | Autonomous agent | Proprietary | Fully autonomous dev tasks |
Note: SWE-bench Verified measures the ability to resolve real GitHub issues. Scores are tightly clustered at the top (~76–81%), so practical differences depend heavily on workflow and tooling.

← Course Home©️ Neil Ernst