Metrics for Software Testing, Part 4: Product Metrics

Series · Part 4 of 4 · Managing with facts

Product metrics measure the quality of the thing you're about to ship. They are the most often-forgotten category of test metric. Project metrics tell you whether the testing is on track to finish on time, they don't tell you what you'll be shipping when it does. This paper covers the two product-metric patterns that anchor a release decision: requirements coverage per area, and residual quality risk at a glance.

Four-part series: Part 1, Why & how · Part 2, Process metrics · Part 3, Project metrics · Part 4 (this paper), Product metrics

What product metrics are for

Process metrics measure capability. Project metrics measure progress. Product metrics measure the quality of the system itself, and they're the metrics most likely to be missing from a test dashboard.

Consider this status line:

95% of tests run. 90% passed. 5% failed. 4% ready. 1% blocked. On schedule.

Is this good news or bad news? It's not possible to say. If the 10% of failed, queued, and blocked tests are unimportant (and if the defects behind the 5% failed tests are unimportant) it's good news. If any of those numbers relate to the load-bearing requirements or the highest-impact risks, it's a crisis. Project metrics by themselves can't tell you which one it is. Product metrics can.

The role of the test team

Testing's role is to measure quality, not to directly improve it. Product metrics reflect the totality of the engineering and project team's work, code quality, requirements quality, architecture, process capability, everything. Using product metrics to reward or punish testers will simply distort the metrics. The consequence is a release decision made on flattering numbers that don't reflect reality.

Requirements coverage, per area, per outcome

The foundation of a product-metric dashboard is requirements coverage reported by major area, broken down by status: untested, tested-and-failed, tested-and-passed, blocked.

Per-area product metric

Requirements coverage on an e-commerce release

Percentage of requirements in each status, by major feature area.

Passed

Failed

Blocked

Untested

Each requirement is classified by the status of the tests traced to it. If all associated tests ran and passed, the requirement is 'passed.' If any ran and failed, 'failed.' If any are blocked, 'blocked.' Remainder 'untested.'

What the chart above actually tells us

Read it row by row. This is a late-cycle e-commerce release.

Browsing: a major feature. Half passed, 9% failing or blocked, 42% still untested. Significant defect work remaining. Release decision: not ready.
Checkout: almost entirely passed, 4% untested. Highest-confidence area. Release decision: ready pending final pass.
Store management: critical for operations (staff can't put items in the store without it). Only 20% passed, 5% failing, 75% untested. Dev and test both behind. Release decision: major concern.
Performance: fully tested, zero failures. Release decision: confident.
Reliability: 67% blocked. Blocking issues probably need management escalation, not more tester effort. Release decision: unknown until unblocked.
Security: 100% untested. The test manager owes someone a conversation about why. Release decision: unknown.
Usability / UI: fully tested, 52% failing. This is the release's biggest in-your-face quality problem. Release decision: major UX remediation needed before ship.

The number of tests doesn't enter the picture. We're reporting requirements status, the question is whether each requirement works, not how many tests backed it. This matters because the relationship between "requirements work" and "stakeholders are satisfied" is much more direct than the relationship between "test cases pass" and "stakeholders are satisfied."

Direct or surrogate metric?

Both, depending on perspective.

From a verification perspective (does the system satisfy its specified requirements?) requirements coverage is a direct metric, and a very good one.
From a validation perspective (will the system satisfy customers and users in the field?) it is a surrogate metric. Good requirements reflect stakeholder needs, but there's no software project in history where the requirements captured every need perfectly. Requirements coverage is necessary for validation but not sufficient.

For validation, the multi-dimensional coverage metrics in Measuring Confidence Along the Dimensions of Test Coverage give you a more complete picture.

Residual quality risk, the at-a-glance release chart

If the team is running a risk-based test strategy, every quality risk item has one or more tests traceable to it, and the number of tests per risk scales with the level of risk. That lets you build the single most useful release-decision chart in software testing: the residual quality risk view.

Each risk item belongs to one of three categories:

Mitigated: all tests ran, all passed, no must-fix defects known.
Pending: no known defects, but tests remain to run.
Failed: at least one test failed, or at least one must-fix bug is known.

The chart shows the proportions, weighted by risk level, so a high-risk item gets a larger slice than a low-risk item in the same category. That weighting is what turns the chart from a pretty picture into a release-decision artifact.

The three snapshots you'll see over a successful cycle

Week 2 of 10

Early test execution

Mitigated15%

Failed18%

Pending67%

Week 5 of 10

Middle of test execution

Mitigated48%

Failed32%

Pending20%

Week 9 of 10

Late test execution

Mitigated78%

Failed12%

Pending10%

A healthy project moves through three recognizable phases:

Early: failed (red) grows faster than mitigated (green) because the high-risk items are tested first and the obvious defects surface early. This is the point of risk-based testing: find the big problems while there's still time to fix them.
Middle: failed and mitigated both grow quickly, and pending (gray) shrinks as the test team burns through the planned tests.
Late: mitigated grows fast as confirmation tests of fixed bugs pass, the failed slice shrinks, and pending is mostly gone. Late-discovered bugs are typically lower-risk items.

When is release OK?

This is a business decision, not a metric decision. If the project management team judges that the residual risk (known open defects, known test failures, and yet-unrun tests) is acceptable compared to the cost of continuing to test, then release is defensible. This is informed subjectivity: you know what you don't know, you know what you're trading off, and you can explain it.

Whether product quality is adequate is a separate question. A good quality risk analysis (see Quality Risk Analysis) reflects the impact on customers and users accurately. Combined with a good residual-risk chart, the decision is made with facts. Bad release decisions almost always come from making this decision without the chart.

The category-level table, where detail lives

The pie chart is the executive view. The project management team needs a detail view organized by risk category:

Risk category	Defects	% of defects	Tests planned	Tests executed	% executed
Performance	304	27%	3,843	1,512	39%
Security	234	21%	1,032	432	42%
Functionality	224	20%	4,744	2,043	43%
Usability	160	14%	498	318	64%
Interfaces	93	8%	193	153	79%
Compatibility	71	6%	1,787	939	53%
Other	21	2%	0	0	0
Total	1,107	100%	12,857	5,703	44%

Read the table row by row, same way as the stacked-bar chart above:

Performance: the largest defect count, only 39% of tests run. Expect more defects. Need fast fix turnaround and any test-execution blockers removed now.
Security: second-largest defect count and 58% of tests still to run. Similar pattern to performance.
Functionality: mid-tier defect count, 57% of tests still to run. Depending on overall schedule, may need acceleration (more testers, more parallelism, or scope cut).
Usability: 64% executed already. If you're running risk-based, the highest-value tests ran early, so this is probably proceeding acceptably.
Interfaces: low defect count, 79% executed. Almost done.
Compatibility: low defect count, but 47% of tests still pending. Worth investigating why.
Other: defects not traced to an identified risk item. If this exceeds 5% of total defects, the quality risk analysis itself has gaps, something was missed.

What the numbers should look like

Three targets for the table:

≈

Defect distribution matches predicted distribution

Risk categories with the most high-likelihood items should have the most defects. Unexpected imbalances = flawed risk analysis.

<5%

'Other' category share of defects

Defects not tracing back to any identified risk item. Above 5% = risks were missed during analysis.

~100%

Planned tests executed by release

Most or all, sometimes deliberate skipping is OK if risk analysis was adjusted mid-flight.

Metric-based individual reviews

Still the rule. Product metrics measure the product, not the people.

If any target slips badly, the retrospective should dig into why, and in particular, whether the quality risk analysis itself needs rework. A good risk analysis is not once-and-done; it's a document you tune over the life of the product.

The full product-metric dashboard

A minimum balanced product-metric dashboard:

Requirements coverage table by major area (stacked bar above, or an equivalent tabular view).
Residual quality risk chart: three slices, risk-weighted, refreshed at every status meeting.
Risk-category detail table: for the project management team, not for the executive dashboard.
Multi-dimensional coverage snapshot: see the coverage-dimensions paper for the full recipe.

Four views. Combined with the bug-trend and test-fulfillment charts from Part 3, and the DDE and DCP process metrics from Part 2, you have a complete instrument panel: you can see process capability, project progress, and product quality at the same time. Managing with facts isn't an abstraction at that point; it's the daily practice of reading three clear charts and acting on them.

Closing the series

Four articles, one goal: build metrics programs worth having. The ideas the series returns to, in four sentences:

The goal-question-metric method is the only reliable way to produce metrics worth producing. Tools give you metrics; tools don't give you goals.
Balance everything. No single metric is trustworthy alone.
Never tie metrics to individual performance appraisal. It destroys the metric and the team.
Presentation matters as much as measurement. Elegance is not a luxury, it's what turns a number into a decision.

A successful metrics program is built with the test team's stakeholders, not defended against them. If your program doesn't exist yet, pick the four-metric starter set in each paper (process / project / product), baseline against your own team's current state, set honest goals, and review quarterly. That's enough to change how your team is perceived and how your releases land. Do that, and you're managing with facts.

Part 1 (Why & how of metrics) the framework.
Part 2 (Process Metrics) process capability.
Part 3 (Project Metrics) project progress.
Measuring Confidence Along the Dimensions of Test Coverage, multi-dimensional coverage as a confidence metric.
Quality Risk Analysis, the analysis that the residual-risk chart summarizes.
Risk-Based Test Results Reporting, four approaches to reporting test results against quality risk.

Metrics for Software Testing, Part 4: Product Metrics

What product metrics are for

Requirements coverage, per area, per outcome

Requirements coverage on an e-commerce release

What the chart above actually tells us

Direct or surrogate metric?

Residual quality risk, the at-a-glance release chart

The three snapshots you'll see over a successful cycle

Early test execution

Middle of test execution

Late test execution

When is release OK?

The category-level table, where detail lives

What the numbers should look like

The full product-metric dashboard

Closing the series

Related reading

Evaluation Before Shipping: How to Test an AI Application Before It Hits Production

Choosing the Right Model (and Knowing When to Switch)

Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D

The ISTQB Advanced Level path, mapped

Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix

Building Quality In: What Engineering Organizations Do from Day One

Where this leads

Software Quality & Security

Risk Reduction & Clear Decisions

Reliable Software at Scale

Working on something like this?