Sonnet 4.6 widens its lead, but only inside Claude Code — three layer-aware deltas this week.
LLM · Layer 1
Sonnet 4.6 widens its cross-source lead over GPT-5 mini to 3.3 points on the quality index, driven by a +1.4 movement on lmsys week 18 and a +0.9 on SWE-bench Verified.
+1.4
Agent · Layer 2
Sonnet 4.6 + Claude Code is now 14 points ahead of GPT-5 mini + Codex CLI on SWE-bench Verified, but the gap closes to 3 points when both are run inside plain SDK.
+2.1
Harness · Layer 3
Cursor's harness gain over plain SDK on Sonnet 4.6 is now 8.7 points; smaller than Claude Code's +14.0 and Codex CLI's +10.6 — the gap is widening, not narrowing.
+0.4
Methodology
AgentBench v3 bumped its task split this week. We have re-ingested the affected history; rows show a 'methodology bumped' footnote for the next 30 days.
Note
Permalink: benchmark-intel.prin7r.com/archive/wk18 · Sources cited inline · No vendor sponsorship.