The Capability-Cost Crossover: How to Keep Your AI Strategy From Going Stale

The 60 Percent Leap

Last month, one of our clients re-evaluated an AI workflow they had scoped six months earlier. Same process, same data, same goals. The difference: the operation now costs 60 percent less to run, and the models can handle steps that were flatly impossible when the project was first designed.

They didn't change their strategy. The landscape changed around them. And because they had the infrastructure to move quickly, they captured the upside immediately.

That kind of leap isn't a one-off. Two forces are compounding simultaneously, and they're reshaping the economics of every AI investment decision:

Model capability is doubling roughly every seven months. Researchers at METR, an independent AI evaluation organization, have been giving AI models real-world software tasks and measuring how complex a task the models can reliably complete. That ceiling has been rising on an exponential curve. Tasks that required a human last year are now handled autonomously. Tasks that need a human today will likely be automated within 12–18 months.
Costs are falling three to 10 percent per month. We wrote about this in detail in our previous analysis: the price of using a best-in-class AI model has dropped from roughly $30 per million tokens to around $2 over the past 27 months, a 96 percent decline.

Methodology

METR is an independent research organization that evaluates AI capability by giving models real software engineering tasks of varying complexity and measuring completion rates. Their dataset tracks how the upper bound of AI task completion has shifted over time, providing one of the most rigorous available measures of AI capability progression.

METR Task Complexity Over Time

METR chart showing the time-horizon of software engineering tasks different LLMs can complete 50% of the time, rising exponentially from near zero in 2020 to over 5 hours by 2025

Source: METR. The Y-axis shows task duration (for humans) where logistic regression predicts the AI has a 50% chance of succeeding. Recent frontier models can now handle tasks that would take a skilled engineer over five hours.

Each of these trends alone would be significant. Together, they create a compounding opportunity that most organizations are not structured to capture. The companies that will benefit most aren't the ones that picked the right model last year. They're the ones that built the ability to evolve their AI systems as both curves move.

This article lays out a practical framework for doing exactly that.

What “AI System” Means (And Why Model Selection Is Only One-Third of the Decision)

When most organizations talk about their “AI strategy,” they mean which model they're using. That's like saying your manufacturing strategy is which machine you bought. The machine matters, but so does the plant it sits in and the jobs you assign to it.

An AI system has three layers, and each one deserves its own strategic attention:

Which engine do you need?

Model selection is more nuanced than picking the most powerful option. AI models now exist on a wide spectrum, from lightweight models that cost fractions of a penny per use to heavyweight models that can reason through complex, multi-step problems. The best available models today can work autonomously on tasks that would take a skilled human several hours. But you don't need that horsepower for every job, and overspending on model capability is one of the most common mistakes we see.

The better question: “which model is best for this specific task at this specific cost?”

Where does it run?

Serving infrastructure determines how the model is deployed and accessed. Options range from calling a provider's API directly (simplest, fastest to start) to running models on your own servers (more control, more complexity). The right answer depends on factors that have nothing to do with AI capability: data residency requirements, latency sensitivity, compliance obligations, and how much operational overhead your team can absorb.

Most mid-market companies should start with direct API access and only move toward self-hosted infrastructure when a specific regulatory or performance requirement demands it.

What jobs do you give it?

Task allocation is where the real strategic leverage lives. Not every task in your operation needs the same level of AI capability. A well-designed system routes simple, repetitive tasks to fast and cheap models, while reserving expensive, high-capability models for complex reasoning and judgment calls. Think of it like staffing: you don't assign your most experienced engineer to every job on the floor.

The organizations getting the most value from AI are the ones that have mapped their task portfolio against model capability tiers, and built the infrastructure to route work accordingly.

Model Tier Comparison

Tier	Cost Range	Best-Fit Tasks	Example Use Cases
Lightweight	$0.01–$0.50 / 1M tokens	Simple classification, text extraction, routing, summarization	Invoice field extraction, email categorization, FAQ routing
Mid-Tier	$0.50–$5.00 / 1M tokens	Document understanding, multi-step extraction, content generation	Report generation, contract analysis, customer response drafting
Frontier	$5.00–$30.00 / 1M tokens	Complex reasoning, multi-step planning, autonomous workflows	Strategic analysis, code generation, multi-document synthesis

The Capability Dividend: The Most Important Concept in AI Budgeting Right Now

Here's where the two curves (rising capability, falling cost) create a strategic decision point that most organizations aren't even aware they're facing.

Every quarter, the cost of running your current AI workload drops. That creates a surplus, what we call the Capability Dividend. You have three ways to spend it:

The Capability-Cost Crossover

Sources: METR capability benchmarks, public API pricing data. Capability score represents the complexity of tasks AI can reliably complete autonomously.

Option A: Pocket the savings.

Reduce your AI spend. Defensible if budgets are tight, but the least strategic choice. You're essentially declining a compounding investment.

Option B: Upgrade capability at the same budget.

Keep your spend flat, but move tasks to more powerful models. The report that used to be generated by a lightweight model with frequent errors can now be handled by a mid-tier model that gets it right the first time, for the same cost. Quality goes up. Human review time goes down.

Option C: Expand coverage at the same budget.

Keep your spend flat, but bring AI into workflows that were previously too expensive to justify. The document processing task that cost $8 per unit last year now costs $1.50. Suddenly, it makes sense to automate the long tail of documents you were handling manually.

Most organizations should default to Options B and C. The capability gains compound: better models produce better outputs, which require less human correction, which frees up your team to focus on higher-value work, which makes the next round of AI expansion more impactful.

The distinction matters. Falling costs are a capability upgrade opportunity, not a savings opportunity. Organizations that pocket the savings will be steadily outpaced by those that reinvest them.

Building the Review Cadence: A Quarterly AI System Check

Static AI strategies are already obsolete. When capability doubles every seven months and costs shift monthly, an annual review cycle guarantees you're operating on stale assumptions.

We recommend a lightweight quarterly review structured around three questions:

Have new models crossed a threshold that matters to us?

Not every model release is relevant to your operation. But roughly once a quarter, a new model or a major update will unlock a capability that directly affects your task portfolio. Maybe it's reliable document understanding for messy scanned PDFs. Maybe it's the ability to follow a multi-step process without losing context. You don't need to chase every release. You need a systematic way to identify when a new capability intersects with a real business need.

Can we upgrade model tier for existing tasks within the same budget?

The Capability Dividend in action. Pull your current usage and spend, re-price it against current model costs, and see where you have headroom. Moving existing workflows to a stronger model is often the highest-ROI move because it improves quality on workflows that are already in production. No new integration work required. A configuration change rather than an engineering project.

Which tasks should move up or down the capability ladder?

Some tasks that required your most powerful model six months ago can now be handled by a cheaper tier. Today's mid-range models outperform last year's best. Other tasks that were too complex for AI are now within reach. The task portfolio mapping from the previous section pays off here: you need a clear picture of what you're running, at what tier, to make these moves efficiently.

The operational cost of the quarterly review is low: a few hours per quarter. The cost of not doing it compounds quickly.

What the Crossover Means for Your Roadmap

In the next six months:

Audit your current AI spend against capability requirements. The most common finding: you're simultaneously overspending on simple tasks (using a powerful model where a lightweight one would suffice) and under-investing in complex tasks (where a more capable model would eliminate manual rework). Fixing the misalignment is usually the fastest win.

Over the next six to 18 months:

Build or adopt task-routing infrastructure that can dynamically allocate work to different model tiers. The routing layer doesn't have to be complex. Even a simple decision layer that routes by task type delivers significant value. The goal is to make model-tier changes a configuration decision, not an engineering project.

On the 18-month-and-beyond horizon:

Prepare for AI systems that can handle multi-hour autonomous workflows. The METR research suggests we're two to four years from models that can reliably work on week-long tasks without human intervention. That's still speculative, but the trajectory is clear enough to start thinking about which of your processes would be transformed by that capability, and what organizational changes you'd need to make to take advantage of it.

A note on budgeting:

Traditional annual IT budget cycles cannot accommodate this rate of change. By the time leadership approves an annual budget, the cost and capability assumptions baked into it are already outdated. We recommend quarterly AI system reviews with pre-approved upgrade paths. Essentially, giving your team a mandate to reallocate within a defined envelope as conditions change, rather than requiring a new approval cycle for every adjustment.

The Honest Tension: Evolution vs. Stability

We'd be doing you a disservice if we presented the crossover as purely upside. There's a real cost to continuous evolution:

Every model change requires testing and validation.
Task routing adds architectural complexity.
Teams need to develop judgment about when to upgrade and when to hold steady.
Some operations genuinely benefit from stability. If a workflow is performing well and isn't cost-sensitive, the right move may be to leave it alone.

You don't need to change everything every quarter. You need the ability to change, and deliberate decisions about what evolves and what stays. The real risk is having no mechanism to move at all, and waking up 18 months from now running a system that's two generations behind at twice the cost.

Your AI System Is a Living Strategy

Both curves (capability doubling every seven months, costs declining monthly) show no signs of slowing down. Competition between providers is accelerating both trends.

The organizations that capture the most value will be the ones that built a repeatable process for evolving their AI systems: reassessing model selection, re-evaluating serving infrastructure, and continuously right-sizing task allocation as the landscape shifts.

Your AI system is an operating strategy, and like any good operating strategy, it needs regular review and deliberate evolution.