Anthropic releases Claude Opus 4.7
Users complain about token burn and combative refusals, model upgrades start to look like price changes
Images
Anthropic’s new Claude Opus 4.7 model is drawing unusually consumer-like complaints for a product sold by the token. Within days of release, users on X and Reddit were posting side-by-side examples of mistakes, “combative” refusals, and high token burn, according to Business Insider.
The backlash is less about whether the model is “smart” than about whether it is predictable. Large language models are priced like utilities—pay per input and output token—yet behave like probabilistic systems whose cost and latency can vary by prompt, context window, and safety scaffolding. When a model uses more tokens to reach an answer, the customer pays twice: once in direct usage fees and again in developer time spent on prompt workarounds, retries, and guardrails. That creates a quiet split in the market between hobbyist usage—where “better” can be subjective—and production usage, where the key metric is cost per correct action under constraints.
Anthropic’s incentives are not mysterious. Frontier-model training and inference remain capital-intensive, and higher-tier models are one of the few levers companies have to segment pricing without rewriting the whole product. If Opus 4.7 is more expensive to run, passing that cost through in token consumption makes the bill look like “usage” rather than a list-price hike. But customers who budget per task, not per token, interpret that as a reliability regression.
The episode also shows how “safety” and “helpfulness” are now product features with competing constituencies. A model tuned to refuse more often may reduce certain categories of risk, but it can also feel adversarial to paying users who want deterministic compliance. Meanwhile, a model tuned to be maximally helpful can create its own liabilities when it confidently produces wrong outputs. As Business Insider notes, some users argue the costs are worth it; Y Combinator CEO Garry Tan praised the model, underscoring how different the value proposition looks to power users who can monetize marginal capability.
For now, the market’s feedback loop is blunt: developers compare bills and failure rates, then route traffic to cheaper models or smaller ensembles. The debate over Opus 4.7 is happening in public, but the decision will be made in dashboards.
Anthropic shipped Opus 4.7; the first reports about it were not benchmark charts but invoices and screenshots.