Claude Sonnet 5 for AI coding in 2026: the tokenizer tax nobody mentions, real cost math, and the Cursor and Copilot setup
TL;DR: Claude Sonnet 5 (released June 30, 2026) lands 63.2% on agentic coding — close to Opus 4.8’s 69.2% at a fraction of the price. Introductory API pricing is $2/$10 per million tokens through August 31, then $3/$15. The catch: a new tokenizer counts ~30% more tokens for the same code, so the real discount is smaller than the sticker.
| Sonnet 5 (intro) | Opus 4.8 | Sonnet 4.6 | |
|---|---|---|---|
| Best for | Daily agentic coding on a budget | Hardest multi-file refactors | Locked, predictable token budgets |
| Price / MTok | $2 in / $10 out (→ $3/$15 Sep 1) | $5 in / $25 out | $3 in / $15 out |
| Agentic coding (SWE-bench Pro) | 63.2% | 69.2% | 58.1% |
| The catch | ~30% more tokens per task (new tokenizer) | 2.5× the output price | Older model, no new tokenizer gains |
Honest take: Through August 31, Sonnet 5 is the default coding backend to reach for — you get ~91% of Opus 4.8’s agentic score for roughly 40% of the cost. After September 1 the math tightens, and heavy agentic users should re-run their own numbers against Opus 4.8 and GPT-5.5 before committing.
Anthropic shipped Claude Sonnet 5 on June 30, 2026, and it went live the same day inside Cursor, Claude Code, and GitHub Copilot. The headline is a Sonnet-class model that gets within striking distance of Opus 4.8 on coding, priced below Sonnet 4.6 during the launch window. That part is real. But there is a detail buried in the docs that changes the cost story for anyone running agentic loops all day, and most launch-day writeups skipped it. That is what this article is actually about.
What actually changed from Sonnet 4.6
Sonnet 5 is a drop-in replacement for Sonnet 4.6 — same API shape, same tools, same response format. The model ID is claude-sonnet-5. It ships with a 1M-token context window (both default and maximum) and 128k max output tokens. Three behavior changes matter when you wire it into a coding tool:
- Adaptive thinking is on by default. On Sonnet 4.6, a request with no
thinkingfield ran without thinking. On Sonnet 5, that same request now runs with adaptive thinking. Becausemax_tokensis a hard cap on total output — thinking plus visible text — a limit you tuned for 4.6 can silently truncate answers on 5. - Manual extended thinking is gone. Passing
thinking: {type: "enabled", budget_tokens: N}now returns a 400 error. Use adaptive thinking with the effort parameter instead. - Sampling parameters are rejected. Setting
temperature,top_p, ortop_kto any non-default value returns a 400. This already applied to Opus-class models; Sonnet 5 is the first Sonnet to enforce it.
The benchmark gains are concentrated exactly where a coding tool spends its time. On SWE-bench Pro (agentic coding), Sonnet 5 scores 63.2%, up from 58.1% on Sonnet 4.6 and closing on Opus 4.8’s 69.2%. On Terminal-Bench 2.1 — sequential multi-step shell tasks, the closest public proxy for how a CLI agent actually behaves — it jumps to 80.4% from 67.0%. On OSWorld-Verified computer use it moves to 81.2% from 78.5%. On CursorBench, Cursor’s own production benchmark, Sonnet 5 hits 57% versus 49% for Sonnet 4.6.
The tokenizer tax: why the sticker price lies a little
Here is the part that changes the budget. Sonnet 5 uses a new tokenizer — the same one Opus 4.7 and later use. Anthropic states it plainly in the docs: the same input text produces approximately 30% more tokens than on Sonnet 4.6. Per-token pricing is unchanged, but you are billed per token, and there are now ~30% more of them for the identical file, diff, or prompt.
Play that forward. During the introductory window, Sonnet 5 input is $2/MTok versus Sonnet 4.6’s $3/MTok — that looks like a 33% cut. But feed both models the same 500-line React component and Sonnet 5 counts ~30% more tokens for it. The real per-task saving during the promo is closer to ~13%, not 33%.
After September 1, standard pricing returns to $3/$15 — identical per-token to Sonnet 4.6. Same per-token price, ~30% more tokens per task, means an equivalent coding session costs roughly 30% more on Sonnet 5 than it did on Sonnet 4.6. You are paying that premium for a genuinely better model — but you should know you are paying it, because nobody’s launch banner says so.
This is the single most important number for anyone whose Cursor or Cline agent chews through context on every turn. Recount your prompts under the new tokenizer before you assume your monthly bill holds steady. Anthropic’s token-counting endpoint will give you the real Sonnet 5 count; do not reuse figures you measured against 4.6.
Real cost math for a coding session
Take a representative agentic run — one Cursor or Cline task that reads a few files, plans, and edits. On Sonnet 4.6 that might be ~50,000 input + ~8,000 output tokens. On Sonnet 5, the same work counts ~30% more: ~65,000 input + ~10,400 output. Here is what that one task costs, using verified per-token rates (no caching):
| Model | Tokens for the task | Cost per task |
|---|---|---|
| Sonnet 4.6 | 50k in / 8k out @ $3/$15 | ~$0.27 |
| Sonnet 5 (intro, through Aug 31) | 65k in / 10.4k out @ $2/$10 | ~$0.23 |
| Sonnet 5 (standard, from Sep 1) | 65k in / 10.4k out @ $3/$15 | ~$0.35 |
| Opus 4.8 | 65k in / 10.4k out @ $5/$25 | ~$0.59 |
| Claude Fable 5 | 65k in / 10.4k out @ $10/$50 | ~$1.17 |
These are illustrative figures computed from Anthropic’s published per-token prices and the stated ~30% tokenizer delta, not benchmark measurements — your real numbers depend on your codebase and how much context your tool resends. But the shape holds: during the intro window, Sonnet 5 is the cheapest capable option on this list and the second-strongest on agentic coding. Opus 4.8 costs ~2.5× more for a ~6-point score bump. That trade is why Anthropic is positioning Sonnet 5 as “a cheaper way to run agents.”
Two levers cut this further, and both apply to real coding tools:
- Prompt caching. Cache reads bill at 0.1× base input. During the promo that is $0.20/MTok; at standard rates $0.30/MTok. Cursor, Cline, and Claude Code all resend large stable context (system prompt, rules files, open files) every turn — exactly what caching is for. On a long session, cache hits can shave the input side by more than half.
- Batch API. Non-interactive work (bulk doc generation, test scaffolding, a scripted migration) runs at 50% off: $1/$5 during the promo, $1.50/$7.50 after. Not useful for interactive editing, very useful for overnight jobs.
Setting Sonnet 5 as your backend
Claude Code. It is already there. Sonnet 5 shipped in Claude Code on launch day for coding-agent workflows; select it from the model picker or set it in your config. If you are on the CLI, claude-sonnet-5 is the model ID.
Cursor. Sonnet 5 appeared in the Cursor model list on June 30. Pick it in Settings → Models, or type it in the model selector in chat/Composer. If you use Max mode, Sonnet 5 Max is available too — Cursor’s production benchmark puts Sonnet 5 Max at 61.2%, between Sonnet 4.6 Max (49.0%) and Opus 4.8 Max (63.8%). For the fixed-fee Cursor experience, Sonnet 5 running under Auto is the sensible daily driver; reserve Opus 4.8 Max for the refactors that genuinely stall.
GitHub Copilot. Sonnet 5 reached general availability in Copilot on June 30 for Pro, Pro+, Max, Business, and Enterprise. Business and Enterprise admins may need to enable the model in policy first. Select it from the model picker in the IDE or CLI. Under Copilot’s usage-based billing, remember the tokenizer tax cuts both ways: more tokens per request means more metered consumption per agentic turn than the same task drew on Sonnet 4.6 — worth watching if you switched to Copilot to control that bill after the June billing change.
Direct API (for Cline, Continue.dev, or your own agent). Point your OpenAI-compatible or Anthropic client at claude-sonnet-5. It is available on the Claude API, Amazon Bedrock (not the legacy InvokeModel/Converse APIs), Google Cloud Vertex, and Microsoft Foundry in preview.
from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
model="claude-sonnet-5", # was "claude-sonnet-4-6"
max_tokens=8000, # remember: this now caps thinking + text
messages=[{"role": "user", "content": "Refactor this module for testability."}],
)
print(resp.content[0].text)
The migration gotchas that will bite you
The drop-in promise is real, but two habits from Sonnet 4.6 setups now throw 400 errors. If you scripted your own agent or tuned a Cline/Continue config, fix these first.
Temperature and other sampling parameters. Any non-default temperature, top_p, or top_k is rejected:
# Fails on Sonnet 5 with a 400 error
resp = client.messages.create(
model="claude-sonnet-5",
temperature=0.2, # <-- remove this
max_tokens=8000,
messages=[...],
)
# Correct: omit sampling params; steer behavior with the system prompt instead
Cline and Continue.dev both expose a temperature slider. If yours is set to anything other than the default, requests will fail until you clear it. This is the number-one reason a working Sonnet 4.6 config breaks on the switch.
Manual thinking budgets. If you set budget_tokens, migrate to adaptive thinking:
# Not supported on Sonnet 5 (returns 400)
thinking = {"type": "enabled", "budget_tokens": 32000}
# Use this instead
thinking = {"type": "adaptive"}
max_tokens truncation. Because adaptive thinking is on by default and max_tokens caps the combined thinking + visible output, an output limit sized tightly for Sonnet 4.6 can cut answers short. If your diffs suddenly arrive half-written, raise max_tokens before blaming the model.
One more behavioral note: Sonnet 5 is the first Sonnet with real-time cybersecurity safeguards. Requests touching prohibited or high-risk security topics may be refused — and a refusal comes back as a successful HTTP 200 with stop_reason: "refusal", not an error. If you run security tooling through it, handle that stop reason explicitly or your agent will treat a refusal as an empty success.
Where Sonnet 5 fits against the field
Against Anthropic’s own lineup, Sonnet 5 is the value pick. It clears Sonnet 4.6 on every coding benchmark that matters and sits ~6 points behind Opus 4.8 on agentic coding for less than half the price during the promo. Claude Fable 5 is a different tier of spend ($10/$50) and, as of late June, was still suspended under a US export-control directive — Sonnet 5 is the sensible everyday Claude backend regardless.
Against the broader market, the comparison is muddier because the Terminal-Bench leaderboard converged and raw scores no longer separate the top tools cleanly. GPT-5.5 leads several public boards, and cheaper open-weight backends like Gemini 3.5 Flash and DeepSeek V4-Flash undercut everyone on price. What Sonnet 5 offers is the combination developers actually pay for: near-Opus coding quality, first-party support in Cursor, Claude Code, and Copilot on day one, and — for eight weeks — a price that undercuts the previous Sonnet. That window is the reason to move now rather than in September.
If your priority is never being at the mercy of a cloud price change or a model suspension, the honest answer is still a local fallback: an OpenCode + Ollama stack costs nothing per token and cannot be turned off by a directive. Pair a capable cloud backend like Sonnet 5 for the hard tasks with a local model for the routine ones. For the hardware to run that local half, see runaihome.com’s local-model VRAM guide.
FAQ
Is Claude Sonnet 5 cheaper than Sonnet 4.6? During the introductory window (through August 31, 2026) the sticker price is lower — $2/$10 versus $3/$15 — but the new tokenizer counts ~30% more tokens per task, so the real per-task saving is closer to ~13%. From September 1, standard pricing returns to $3/$15, identical per-token to Sonnet 4.6, which means an equivalent coding task costs roughly 30% more on Sonnet 5 because of the token count. You pay that premium for a stronger model.
What is the model ID for Claude Sonnet 5?
claude-sonnet-5. It is a drop-in replacement for claude-sonnet-4-6 in the API.
Does Sonnet 5 work in Cursor and GitHub Copilot? Yes, both from June 30, 2026. It is in the Cursor model picker (including a Max variant) and is generally available in GitHub Copilot for Pro, Pro+, Max, Business, and Enterprise plans.
Why do my Sonnet 4.6 API calls fail on Sonnet 5?
Almost always because you are setting a non-default temperature, top_p, or top_k — those now return a 400 error. Remove them. Manual budget_tokens thinking also errors; switch to {"type": "adaptive"}.
Is Sonnet 5 good enough to replace Opus 4.8 for coding? For most daily agentic work, yes — it scores 63.2% on agentic coding versus Opus 4.8’s 69.2% at well under half the cost during the promo. Keep Opus 4.8 (or Max mode) for the multi-file refactors where the extra ~6 points earns its keep. See our Cursor vs Claude Code and Claude Code review for tool-level guidance.
What is the context window and output limit? 1M input tokens (default and maximum) and 128k max output tokens, at standard pricing across the full window.
Sources
- Introducing Claude Sonnet 5 — Anthropic
- What’s new in Claude Sonnet 5 — Claude Platform Docs
- Anthropic pricing (model, caching, batch rates) — Claude Platform Docs
- Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8 benchmarks — MarkTechPost
- Anthropic launches Claude Sonnet 5 as a cheaper way to run agents — TechCrunch
- Claude Sonnet 5 is generally available for GitHub Copilot — GitHub Changelog
- Claude Sonnet 5 Now Available — Cursor Community Forum
Last updated July 1, 2026. Pricing and features change frequently; verify current state before purchasing.
Was this article helpful?
Thanks for the feedback — it helps improve future articles.