The 137-row skills matrix for managing a test function's capability.
Four roles, three domains, per-person ratings, team minimum and average — the snapshot finance and HR will both use.
A comprehensive test-team skills assessment matrix covering 137 specific skills across three domains (Testing Skills, Domain Knowledge, Technical Expertise) and four roles (Test Technician, Manual Test Engineer, Automated Test Engineer, Test Manager). Each skill is rated Required / Desirable at 0–3 competency; per-person ratings roll up to team-minimum and team-average views, making capability gaps and redundancy visible for hiring, development, and succession planning.
A test team's capability is not a single number. It is a matrix of 100+ skills across several roles, and its two most important views are (a) the team minimum — where is nobody capable? — and (b) the team average — where is the team thin enough that a single resignation breaks coverage? The skills matrix is the artifact that makes both visible. Using it quarterly turns team-building from a gut-feel activity into a capability-portfolio discipline.
Key Takeaways
Four things to remember.
Role minimums define hiring spec, not averages
When you post an ATE position, the minimums column for ATE tells the recruiter what "required" means: these skills must be present, full stop. Desirables are negotiable. Averages inform training plans, not hiring screens.
Team minimum is the weak-link view
A row where team minimum = 0 means no one can cover that skill. These are the failure modes that surface as emergencies ("no one knows our performance-testing stack; Alice is on leave"). Fix by cross-training or by planning redundancy during hiring.
Team average is the depth view
A row where team average = 1.0 means the skill is technically present but thin. These are capability investments — invest through assignments, paired work, or training — before they become emergencies.
Quarterly cadence, not one-time snapshot
The matrix is maintained, not filed. Quarterly re-rating captures people's growth (and sometimes decline when a skill is no longer exercised). Show the movement over four quarters; the pattern is the team-health signal.
Why this exists
What this template is for.
The downloaded .xls includes the full 137-row matrix with two worked examples: a four-person enterprise team rated at two points in time (before and after an additional hire) and a five-person variant that includes the Test Technician (TT) role for a larger team. Use either as a starting point; both the role set and the skill rows are designed to be edited.
The column reference below documents each of the 137 skills by the three domains. The instructions explain how to rate, how to roll up, and how to use the output for hiring, development, and succession.
The columns
What each field means.
Captured as text, not 0–3. Degree subject area matters (CSE / CS / Engineering / Math / Business / Domain-specific). Certifications: ISTQB, ISEB, CSQE, TMMi, CMM-related, product-specific. "Other" captures role-relevant credentials (CPA for financial testing, LFC / GLBA / HIPAA credentials for regulated testing).
Years per experience category. Roles minimum typically 5R for senior positions (MTE/ATE/TM), D for TT. Domain and non-domain years tell you where this person's intuition comes from.
Eight skills, 0–3 scale. Minimum R/D by role. These are the skills that separate a technically-competent but unreliable engineer from a senior one. Low ratings here are the hardest skills to develop and the most disruptive when absent.
Five general rows. Foundational literacy; everyone on the team should be at least 1 here, and TMs at 2–3.
Five planning rows. Most important for TMs (required at 2–3); optional (Desirable) for MTE/ATE but strong planning skills accelerate leadership-track ATE/MTE.
Design-technique coverage. Structural is mostly ATE; static is mostly MTE. Property-based and AI-system evaluation are 2026 additions reflecting the shift away from traditional-only scripted testing.
The ATE home domain. Required at 3 for ATE. Tool names should be edited per your 2026 stack (Playwright / Cypress / WebdriverIO / Pact / k6 / Locust / OTel-integrated).
Shared across roles. Version control and CI/CD are table stakes (R at 1+); orchestration (Testcontainers, K8s ephemeral envs) is ATE-weighted.
Execution discipline. Every tester needs 3 on Bug Reporting and Bug Isolation — these are the deliverables. Status Reporting and Metrics weight toward TM but everyone contributes.
Per-person roll-up across the ~31 testing-skill rows. Comparable across the team; used to compare tenure-matched peers or track growth over time.
The worksheet ships with a word-processing / document-management example (21 rows). Delete it; substitute your product's 15–25 domain rows. Categories to include: product area coverage, customer workflow literacy, regulatory / compliance understanding, platform-specific conventions (iOS / Android / Web / Cloud).
The engineer-craft domain. 2026 substitutions vs. the 2002 source: add cloud (AWS/GCP/Azure competency), observability stack (OTel, Datadog, Honeycomb), security (OWASP Top 10, SAST/DAST tooling, IAM basics), AI/ML literacy (prompt-engineering for eval, LLM-assisted test authoring, eval-harness design).
Per-person roll-up across technical rows. Together with Testing Skills average, distinguishes T-shaped seniors (high on both) from role-specialists (high on one).
Rated 0–3 (0 = no knowledge, 1 = some, 2 = knowledgeable, 3 = expert), or "Yes"/"No" for binary skills, or text for education/credentials. Rate annually or quarterly; keep the history.
Lowest rating across team members on that skill. A zero here = no coverage; a 1 = thin coverage. Highlight reds/yellows for review.
Mean rating. Compare against role-weighted target (weight by who should be strong on this skill).
Role minimums across domains
Required profile per role.
Each bar is that role’s minimum required competency (on the 0–3 scale) across the three domains. TMs are Testing-Skills-heavy and lighter on Technical. ATEs are the opposite. Domain Knowledge is broadly required and customized per product.
Role minimums · 0–3 scale, required level
Role profile across the three domains
Adapted from the 137-row template. Numbers are role-minimum required levels.
Hiring spec comes from the role-minimums column. Required = R at 2+. Below that on a Required skill = train or reassign. Desirables expand the role.
Live preview
What it looks like populated.
Skills matrix skeleton (abbreviated). The downloaded worksheet has the full 137 rows.
| Skill | TM min | MTE min | ATE min | Person A | Person B | Team min | Team avg |
|---|---|---|---|---|---|---|---|
| Testing Standards | 3R | 2R | 2R | 3 | 2 | 2 | 2.5 |
| Estimation (planning) | 3R | D | D | 3 | 1 | 1 | 2.0 |
| Quality Risk Analysis | 3R | 2R | 2R | 3 | 2 | 2 | 2.5 |
| Behavioral (black-box) design | 2R | 3R | 2R | 3 | 3 | 3 | 3.0 |
| Bug Reporting | 3R | 3R | 3R | 3 | 3 | 3 | 3.0 |
| API testing / Contract testing | D | 2R | 3R | 2 | 3 | 2 | 2.5 |
| Property-based / Fuzz testing | D | D | 2R | 1 | 2 | 1 | 1.5 |
| AI-system evaluation | D | 1R | 2R | 1 | 2 | 1 | 1.5 |
| CI/CD pipelines | D | 1R | 2R | 2 | 3 | 2 | 2.5 |
| Observability stack (OTel / APM) | D | 1R | 2R | 1 | 3 | 1 | 2.0 |
| Cloud platform (AWS/GCP/Azure) | 1R | 1R | 2R | 2 | 3 | 2 | 2.5 |
| Security fundamentals (OWASP) | 1R | 1R | 1R | 2 | 2 | 2 | 2.0 |
Team min vs. team average · sample skills
Weak-link and depth views.
A two-person team rated across eight representative skills. Where team minimum is 0–1, the team is one illness or one resignation from a crisis. Where team average is 1.5 or lower on a broadly- required skill, the skill is present but thin. Both need investment — reds before yellows.
Team-avg column · sample program
Team average rating on sample skills
Red-zone bars (team avg ≤ 1.5) are thin-depth skills — invest before they turn into team-minimum zeros.
The 2026 additions (AI-system eval, property-based, observability, cloud, security) are the most commonly under-invested. Teams that fund these outperform teams that don't.
How to use it
8 steps, in order.
- 1
Customize the row set. Start with the 137-row template; delete rows that do not apply to your product (e.g., drop the mainframe row if you have no mainframe exposure) and add rows for your 2026 stack (Testcontainers, Playwright/Cypress, contract testing framework, OTel, LLM API evaluation).
- 2
Set per-role minimums (R / D) based on your organization's role definitions. If you use different role labels (e.g., SDET vs. ATE, QE vs. TM), rename the columns; keep the minimum-setting discipline.
- 3
Rate every team member on every row. A blank cell means "not yet rated"; a 0 is a deliberate "no knowledge." Clock in 90 minutes per person the first time; 30 minutes per quarter for updates.
- 4
Roll up team minimum and team average for every row. These two columns are what you act on.
- 5
Identify the red squares: team minimum = 0 on an R skill for that role. These are the highest-priority cross-training targets and hiring filters.
- 6
Identify the yellow squares: team minimum = 1 on an R skill, or team average <= 1.5 on a skill required across the team. These are the thin spots; target them in the next quarter's learning goals, paired work, or stretch assignments.
- 7
Use the output as input to hiring. When you open a requisition, the skills matrix tells you what "required" means (rows with team minimum = 0 on R) and what would be a great-to-have (rows with team average <= 1.5 that a new hire would lift).
- 8
Rate quarterly. Trending four quarters of team-average and team-minimum tells you whether the team is growing, declining, or static on each capability dimension.
The rating scale · deliberately coarse
Zero through three.
Finer scales invite calibration disputes without adding signal. Four levels. R/D overlay for required vs. desirable.
Methodology
The thinking behind it.
The 0 / 1 / 2 / 3 scale is deliberately coarse. Finer scales (5-point, 10-point) invite calibration disputes without adding signal. 0 = I cannot do this; 1 = I have done this with support; 2 = I can do this unsupervised; 3 = I can teach this to others.
The R / D distinction (Required / Desirable) is load-bearing. Required skills for a role must be at 2+ for someone in that role to be autonomous. Desirable skills expand the role. A role-player below R on a required skill needs training or re-assignment; a role-player above their R on many desirables is promotion-track.
The three domains (Testing, Domain, Technical) correspond to the three competency axes of a complete test-function engineer. Testing is role-specific craft; Domain is customer / product empathy; Technical is the platform literacy that prevents testing from being a black box. Most teams are strong in one, competent in one, weak in one — the matrix shows which.
In 2026 the most common gap is the Technical axis — specifically cloud / observability / security / AI-system evaluation. The 1998–2010 test profession grew a rich Testing-axis skill base but under-invested in Technical. Teams that fund Technical-axis growth (AWS/Azure certifications, OTel workshops, OWASP training, eval-harness design) consistently outperform teams that don't.
Common failure mode: using the matrix as an annual HR ritual rather than a planning artifact. The matrix loses signal if it is updated once a year; updated quarterly, it becomes the team-health telemetry that feeds into the test policy (see the test-policy-template), the team-building process, and the hiring-and-developing-staff discipline.
Take it with you
Download the piece you just read.
We keep this library free. All we ask is that you tell us who you are, so we know who to follow up with if we release an updated version. One-time form, this browser remembers you after that.
Related in the library
Pair this with.
Need a QA program to back this up in your organization?
If a checklist is not enough and you want help applying it to a live engagement, we can have a call this week.
Related reading
Articles, talks, guides, and case studies tagged for the same audience.
- Whitepaper
Evaluation Before Shipping: How to Test an AI Application Before It Hits Production
The release-gate playbook for AI features. Covers the five evaluation dimensions, how to build a lean golden set, where LLM-as-judge is trustworthy and where it lies, rollout mechanics with named exit criteria, and the regression suite that keeps a shipped AI feature from quietly rotting in production.
Read → - Whitepaper
Choosing the Right Model (and Knowing When to Switch)
A practical framework for matching LLM model tier to task. Covers the four axes (capability, latency, cost, reliability), cascade routing patterns that cut cost 60 to 80 percent without measurable quality loss, switching costs you did not plan for, and the worked economics at 10K, 100K, and 1M decisions per day.
Read → - Whitepaper
Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D
Most engineering L&D programs over-index on a single certification family, usually ISTQB on the QA side, AWS on the infrastructure side, and under-invest across the rest of the technical domains the org actually needs. This paper covers a multi-domain certification roadmap (QA, AI, cloud, data, security, project management, software engineering) with sequencing logic for each level of the engineering ladder, plus the maintenance discipline that keeps the roadmap relevant as the technology shifts underneath it.
Read → - Guide
The ISTQB Advanced Level path, mapped
The Advanced Level landscape keeps changing — CTAL-TA v4.0 shipped May 2025, CTAL-TM is on v3.0, CTAL-TAE is on v2.0. This guide maps all four core modules, prerequisites, exam formats, sunset dates, and which module a given role should take first. Links directly to the authoritative istqb.org syllabi.
Read → - Whitepaper
Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix
Bug triage is the cross-functional decision process that converts raw defect reports into prioritized action. Done well, it optimizes limited engineering capacity against risk; done poorly, it becomes a backlog-management ritual that neither fixes the important defects nor drops the unimportant ones. This whitepaper covers the triage process, the participants, the six action outcomes, the four decision factors, and the governance disciplines that keep triage effective in continuous-delivery environments.
Read → - Whitepaper
Building Quality In: What Engineering Organizations Do from Day One
Testing at the end builds confidence, but the most efficient quality assurance is building the system the right way from day one. This whitepaper covers the upstream disciplines — requirements clarity, lifecycle selection, per-unit programmer practices, and continuous integration — that make system-level testing cheap and fast rather than the only thing holding a release together.
Read →
Where this leads
- Service · Quality engineering
Software Quality & Security
Independent test programs, security testing, and quality engineering for systems where defects cost real money.
Learn more → - Solution
Risk Reduction & Clear Decisions
Quality programs and decision frameworks that shift risk discussions from anecdote to evidence.
Learn more → - Solution
Reliable Software at Scale
Quality engineering programs for organizations whose software is now operationally critical.
Learn more →