Since 1994

Choosing the

right model.

And knowing when to switch · Rex Black, Inc.

"Use the biggest model"

is the default.

And it's wrong.


Safe at prototype. Expensive at production.


Mid-market AI deployments we audit: 60 to 80 percent of the bill is available savings, at quality parity, from disciplined routing.

REX BLACK, INC. · MODEL SELECTION
Four axes

Score every choice

on the same four axes.


Axes

Capability. Score on your golden set. Public benchmarks have no predictive value.
Latency. P50 and P95 against the call site budget.

Axes (cont.)

Cost. Per-decision, fully loaded. Include retry rate and fallback rate.
Reliability. Single-provider 99.5% = four hours of degraded ops per month.

REX BLACK, INC. · MODEL SELECTION

The rule

that matters most.


If a smaller model scores within 2 to 3 points of a larger model on your rubric, that is within the noise floor.


The smaller model is capable enough.
The cost gap is a pure economic win.

REX BLACK, INC. · MODEL SELECTION
Routing

Cascade routing

with a confidence gate.


Shape

First pass, small model. ~80% resolved here.
Confidence score on the output.
Promote below threshold to flagship.

Tuning metric

Promotion rate. 15-20% after tuning is the target.
Above 30% = gate is wrong.
Below 10% = quality is bleeding.

REX BLACK, INC. · MODEL SELECTION

The economics

at three tiers.


10K/day

Flagship: ~$9K/month.
Cascade: ~$3K/month.

100K/day

Flagship: ~$90K.
Cascade: ~$28K.

1M/day

Flagship: ~$900K/month.
Cascade: ~$240K/month.


Payback

Fastest engineering work on the backlog.

REX BLACK, INC. · MODEL SELECTION
Prompt · Fine-tune · Switch

Three capability moves.

In order.


Prompt first

Reversible. Cheap. Always first.

Fine-tune

Only when the task is stable and narrow. Be honest about the re-train obligation.

Switch tiers

When the gap is capability, not data.

No amount of prompting closes a capability gap.

REX BLACK, INC. · MODEL SELECTION

Switching cost

is higher than you think.


Taxes

Prompt tuning. Prompts rarely port across models.
Eval re-run. Full golden set, new scores.

Taxes (cont.)

Regression re-verify. Known bugs may not still be bugs.
Operational memory. Two weeks of on-call recalibration.


Design for switching from day one. Prompts and models are config, not code.

REX BLACK, INC. · MODEL SELECTION

Multi-provider fallback.

Not optional for revenue-touching AI.


Every provider has at least one significant outage per quarter.


The cheapest viable pattern: two providers, one hot standby, routed through a thin abstraction.

The standby does not need to be same-family. It needs to be good enough to carry traffic while the primary recovers.

REX BLACK, INC. · MODEL SELECTION
Architecture

The provider

abstraction layer.


Fifteen lines of code. Build it before you need it.


Input

Request payload.
Model hint.
Latency budget.

Output

Response.
Provider used.
Cost incurred.

REX BLACK, INC. · MODEL SELECTION

This week.


Instrument per-decision cost on the hottest feature. If the answer is "we would need to calculate," that is the first gap.
Run the cascade prototype. Two-tier, naive gate. If quality holds and cost drops, you just funded the next two quarters of AI work.
Audit switching cost. If migrating off the current model takes more than five engineering days, your lock-in is bigger than the team realizes.

REX BLACK, INC. · MODEL SELECTION
Rex Black, Inc.

Route before

you upgrade.

rexblack.com/resources/writing/ai-model-selection-framework