Skip to main content
Reference · Quality Risk Categories

The risk taxonomy every FMEA starts from.
Sixteen categories, three test-level views, and the 2026 additions nobody used to need.

The list of quality risk categories our FMEAs, test plans, and risk-based strategies start from. Use it as a completeness check — every category is a question you should have answered for this release, even if the answer is "not applicable."

Flat categories
16
Test-level views
3
2026 additions
7

A risk taxonomy is a completeness check, not a filing cabinet. Use it to be confident you have not missed an entire class of failure — then discard the categories that do not apply to this release.

Key Takeaways

Four things to remember.

01

Sixteen categories, zero ceremony

The flat list is a walk-through: name each category out loud at the FMEA kickoff. "Localization — applies? Installation — applies? Competitive inferiority — applies?" Three minutes, no missed class.

02

Different test levels see different risks

Component testing sees states, transactions, and code coverage. System testing sees localization, installation, and competitive inferiority. The per-level cross-reference keeps each team focused on what they can actually test.

03

The taxonomy has to move forward

The 2002 taxonomy did not include accessibility as a legal exposure, observability as a release signal, or AI-system-specific risks at all. The 2026 additions section names the seven categories every modern risk analysis should add.

04

Pair with ISO/IEC 25010:2023

When a formal framework is required (regulated environments, buyer-mandated compliance), map this taxonomy to ISO/IEC 25010:2023 product-quality characteristics. Both views are useful; neither is a substitute for the other.

Overview

Every risk-based test strategy starts with a list. A quality risk category is a class of failure the system could have; the risk items inside each category are specific failures a specific engagement is exposed to.

Most programs waste their first FMEA session reinventing the category list. This page publishes the one we use. Treat it as a starting menu — not every category applies to every product, but naming them all forces an explicit answer instead of an implicit gap.

Below: (1) the flat 16-category reference, (2) the per-test-level cross-reference (which categories are visible at component, integration, and system/acceptance levels), and (3) the seven categories we added between 2002 and today because products and their failure modes have changed.

Three views of one taxonomy

What to expect below.

The 16-category flat list is the completeness check every FMEA should walk at session start. The per-test-level re-cut is the planning aid that tells you which team owns which risk. The seven 2026 additions close the gap between the 2002 taxonomy and today’s failure surface.

16
Flat categories
The canonical taxonomy. Walked top-to-bottom in three minutes before every FMEA.
3
Test-level views
Component, integration, system-and-acceptance. Each level sees a different subset.
7
2026 additions
Accessibility, observability, FinOps, supply chain, AI accuracy, AI safety, explainability.
pair with
ISO/IEC 25010:2023
Map to the ISO characteristics when a formal framework is required. Both views coexist.

01

1. The 16 quality risk categories

The canonical flat taxonomy. Each category is described as "what kind of problems fit here" — deliberately narrow, so an item belongs in exactly one category when you are populating an FMEA.

Functionality

Failures that cause specific features not to work as specified.

Load, Capacity, and Volume

Failures in scaling of the system to expected peak concurrent usage levels and data volumes. Distinct from Performance — Performance is "does a single request meet its SLO," Load is "does the system still meet its SLO with 10,000 concurrent users."

Reliability / Stability

Failures to meet reasonable expectations of availability and mean-time-between-failure. Includes memory leaks, resource exhaustion, and degradation over uptime.

Stress, Error Handling, and Recovery

Failures under beyond-peak or illegal conditions, and the knock-on effects of deliberately inflicted errors. Covers recovery from power loss, network partition, and upstream service failure.

Date Handling

Failures in date math and handling — time-zone boundaries, daylight-saving transitions, leap seconds, year boundaries, fiscal vs. calendar year, and (still, in 2026) systems whose internal epochs expire.

Competitive Inferiority

Failures to match competing systems in quality. Frequently overlooked in internal risk analysis because it requires market research, not engineering research.

Operations and Maintenance

Failures that endanger continuing operation, including backup and restore, runbooks, on-call workflows, and the ability for operations staff to recover from production incidents without engineering escalation.

Usability

Failures in human factors — especially at the user interface, but also in the installation flow, onboarding, and recovery from user error.

Data Quality

Failures in processing, storing, or retrieving data. Includes silent corruption, precision loss, truncation, encoding mangling, and failed referential integrity.

Performance

Failures to perform as required under expected loads. Latency, throughput, and responsiveness against an SLO or budget.

Localization

Failures in specific localities — language, dictionary/thesaurus, collation order, number/date/currency formatting, and localized error messages.

Compatibility

Failures with specific supported OS / browser / device / runtime / dependency combinations. Includes regression against dependency upgrades (the kind CI finds before users do) and the combinatorial explosion of "minor versions."

Security and Privacy

Failures to protect the system and secured data from fraudulent or malicious misuse. Pair with the seven-step security risk reduction whitepaper — this is a surface area with its own deeper taxonomy (OWASP Top 10, MITRE ATT&CK, CWE).

Installation / Migration

Failures that prevent or impede deploying the system. In 2026 this also covers CI/CD failures, canary/blue-green/rollback integrity, database migration safety, and rollback data loss.

Documentation

Failures in operating instructions for users or system administrators, including API reference accuracy, runbook accuracy, and deprecation notice quality.

Interfaces

Failures in interfaces between components — wire formats, contract violations, schema drift, protocol version mismatches, silent field removal.

02

2. Quality risks by test level

The same taxonomy re-cut by which test level has the signal to find each risk. This is a planning aid: assign risk categories to the test level where they are most cost-effectively caught, and use it to identify gaps where a category does not appear at any level (which is almost always a test-plan bug, not a product bug).

Component testing

What the component-test layer is responsible for. Run close to the code, fast feedback, high volume.

  • States — internal state transitions, state-machine correctness.
  • Transactions — single-component unit-of-work correctness.
  • Code coverage — structural coverage of the implementation.
  • Data flow coverage — variable definition/use pairs, data-flow anomalies.
  • Functionality — component-level feature behavior.
  • User interface — component-local UI rendering / input handling (if applicable).
  • Mechanical / signal / embedded properties — for physical products, component-level physical correctness.

Integration testing

What the integration-test layer is responsible for. Crossing component boundaries, contract verification, data flow between subsystems.

  • Component or subsystem interfaces — contract verification, schema compliance.
  • Functionality — feature behavior that spans components.
  • Capacity and volume — subsystem-level load behavior.
  • Error / disaster handling and recovery — failure propagation across component boundaries.
  • Data quality — integrity across subsystem boundaries.
  • Performance — subsystem-level latency and throughput.
  • User interface — UI-layer integration with data layer.

System and acceptance testing

What only the full assembled system can test. The categories below are the reason system test exists — they are invisible at lower levels.

  • Functionality — end-to-end feature correctness.
  • User interface — whole-experience usability.
  • States and transactions — end-to-end workflow correctness.
  • Data quality — persistent-store integrity and recovery.
  • Operations — backup/restore, runbook correctness, incident response.
  • Capacity and volume — whole-system load behavior.
  • Reliability, availability, stability — uptime against SLO.
  • Error / disaster handling and recovery — full-system failure and recovery paths.
  • Stress — beyond-peak and illegal-input behavior.
  • Performance — end-to-end latency and throughput.
  • Date and time handling — calendar correctness in context.
  • Localization — locale-specific correctness in context.
  • Networked and distributed environment behavior — behavior across network topologies.
  • Configuration options and compatibility — cross-configuration correctness.
  • Standards compliance — regulatory, accessibility, security standards.
  • Security and privacy — full-surface security posture.
  • Environment — deployment-environment correctness.
  • Installation, cut-over, setup, and initial configuration — first-run correctness.
  • Documentation and packaging — docs accuracy and operator-readiness.
  • Maintainability — post-release operational burden.
  • Alpha, beta, and other live tests — controlled-exposure pre-release validation.

03

3. Categories added since 2002

Seven risk categories that were either absent from the original taxonomy or subsumed into generic "non-functional" risk. Each of these has earned a named slot in the current-era list because programs that ignore them get publicly bitten.

Accessibility

Failures to meet accessibility standards (WCAG 2.2, Section 508, EN 301 549, ADA case law). In 2002 this sat inside Usability; in 2026 it is a distinct legal and regulatory exposure with its own testable surface (screen-reader behavior, keyboard navigation, contrast, motion/animation preferences, ARIA semantics, cognitive load, localization of accessibility affordances). Enterprise programs that ignore it invite lawsuits.

Observability

Failures in the instrumentation that would let an operator tell whether the system is working in production. Missing / mis-tagged / too-noisy logs, metrics without cardinality controls, spans that do not cross service boundaries, dashboards that contradict each other. This is distinct from Operations — Operations is "can we run it," Observability is "can we tell what it is doing." Pair with production-telemetry-driven test authoring (see building-quality-in whitepaper).

Cost and financial operations (FinOps)

Failures that cause unexpected cost blowouts — runaway background workers, recursive retry storms, uncapped logging, uncapped LLM-API calls, storage leaks, N+1 query patterns in production traffic. For cloud-native systems this is first-class quality risk: a correctness-passing release can still cause a business-critical incident via the monthly bill.

Supply chain

Failures introduced via dependencies — vulnerable packages, transitive dependency drift, build-system compromise, dependency-confusion attacks, malicious maintainer takeover, SBOM inaccuracy. Includes the CI/CD pipeline itself and the ecosystems it pulls from. In 2002 this was a subset of Compatibility; in 2026 it is its own surface with its own tooling (SCA, SBOM attestation, provenance verification, pinned dependencies).

AI-system accuracy and calibration

For systems whose primary job is inference (classification, generation, recommendation, forecasting), the output is probabilistic — correctness becomes a distribution, not a pass/fail. Risk items: held-out evaluation set accuracy, calibration (does 80% confidence mean 80%?), performance on adversarial / out-of-distribution / long-tail inputs, regression against specific demographic slices. Cannot be tested with example-based assertions; requires eval-set testing, golden-set testing, and slice-based metrics.

AI-system safety, integrity, and alignment

Distinct from generic Security: prompt injection, indirect prompt injection via retrieved documents, over-trust of tool outputs, jailbreaking, training-data poisoning, model-supply-chain risk, hallucination on high-stakes outputs, bias/fairness failures with demonstrable disparate impact. For LLM-backed systems, most of OWASP LLM Top 10 lives here rather than under Security.

Explainability and auditability

Failures to produce a defensible account of why the system did what it did — in regulated contexts (credit, hiring, healthcare, underwriting, insurance pricing, immigration, education), increasingly mandatory. Includes decision logs, feature attribution, lineage of data and model versions, and the ability to reconstruct a specific production decision six months later. Pair with the Cost-of-Exposure and Compliance risks inside Security / Privacy.

04

4. How to use this list

A completeness check, walked top to bottom before the FMEA session closes.

  • Walk the 16-category flat list at the top of the session. For each, ask "does this category contain any items that apply to this release?" If yes, the team drafts items. If no, log an explicit "not applicable because…" — the explicit negative answer is the completeness signal.
  • Cross-check against the per-test-level view. Every risk category that has items should map to at least one test level with a plan to cover it. Categories with items but no test level mapped = test-plan gap.
  • Apply the 2026 additions as a second pass. For any modern system, at least three of the seven will apply; for AI-backed systems, usually five or more.
  • Map the final category list to ISO/IEC 25010:2023 characteristics if a formal framework is required for the engagement (regulated buyer, compliance audit, procurement scoring). Both views coexist — the named categories drive the FMEA items, the ISO mapping supports external reporting.
  • Treat the list as living. When a new failure class bites an engagement, propose a category addition in the next methodology review. The 2002 list got us 24 years; no taxonomy survives forever.

Where each category is first catchable

System test exists for the bottom category.

Count of the 16 categories that are first cost-effectively catchable at each test level. A category catchable at component is usually cheapest to catch there; pushing it later is more expensive. Any category that never appears at any level is a test-plan gap.

Category count by first-catchable level

Risk categories × test level

Categories count once at the lowest level where they're first catchable.

System and acceptance7Component5Integration4

System and acceptance catches the most categories because many (localization, usability, installation, compatibility, reliability, stress, recovery) are invisible at lower levels — they require the fully-assembled system.

Take it with you

Download the piece you just read.

We keep this library free. All we ask is that you tell us who you are, so we know who to follow up with if we release an updated version. One-time form, this browser remembers you after that.

Need a QA program to back this up in your organization?

If a checklist is not enough and you want help applying it to a live engagement, we can have a call this week.

Related reading

Articles, talks, guides, and case studies tagged for the same audience.