Test Execution Processes: Managing the Search for the Unexpected

Companion paper · pairs with QA Library checklist

The test manager's role during test execution is to manage a search for the unexpected. It is the only management role in the software organization whose core responsibility is searching for the unexpected. A competent test manager has to run that search crisply, in the face of surprise findings and shifting plans, and still produce an honest, timely read on quality, cycle after cycle. This paper covers the seven-step execution process, a worked cycle, the six quality indicators, and the challenges every execution effort has to navigate.

Pairs with the Test Execution Process checklist in the QA Library, a printable one-pager you can take into a standup or a status meeting.

Scope: what this paper is (and isn't) about

Test execution touches everything. If we try to pull on that thread we'll end up discussing the entire project management organization. This paper narrows the scope to the internal processes the test team performs to run tests and report results. We assume the release-management process has already delivered a build into the test environment, ready for testing.

Narrowing the scope doesn't shrink the stakes. When execution is running, the spotlight is on the test team. Failing to execute this internal process deftly produces fuzzy, incomplete, or inaccurate status, which is how test organizations lose credibility fastest.

Definitions

Different shops use different vocabulary. Here is the vocabulary this paper uses:

Test step. A short action (entering a page of data, clicking through a flow) that produces a test condition.
Test condition. An interesting situation in which the system is exercised for invalid behavior, response, or output.
Test case. A collection of test steps designed to exercise a small number of related conditions.
Test suite. A collection of related test cases.
Test cohort. All the test suites that apply to the current test phase (system test, UAT, regression, hardening, etc.).
Test cycle. A selection of suites (often a subset of the cohort) run against a particular build.
Test pass. A complete run through the cohort: every case in every suite, either in one cycle or spanning multiple.

The seven-step reference execution process

Based on an overall quality risk management strategy, select a subset of test suites from the cohort for this cycle.
Assign the test cases in each suite to testers for execution.
Execute tests, report bugs, and capture status continuously as the cycle runs.
Resolve blocking issues as they arise.
Report status, adjust assignments, and reconsider plans and priorities daily.
Manage the end game: when the cycle runs out of time, eliminate unrealizable cases in reverse-priority order (lowest first, highest last).
Report cycle findings at the end of each cycle.

The next section walks through a hypothetical cycle using this process. The following sections name the quality indicators and the common challenges.

A worked cycle

Suppose you're testing release 1.1 of a web-based document editor called SpeedyWriter. You're in the system test phase and the first build was just installed. Your cohort has four suites (functionality, performance and stress, error handling and recovery, and localization) each with a handful of cases. In real life you'd have far more suites and more, smaller cases; this example is deliberately simple enough to fit on a page.

You need a tracking surface. In the simplest version of this, a spreadsheet per suite lists each test case with a row for its current state (planned → ready → in progress → pass / fail / blocked), the tester assigned, the bug IDs it produced, the planned and actual date, the hours spent, and room for comments. In a modern setup the same data lives in your test-management tool (Xray, Zephyr, TestRail, qTest, or equivalent), but the information model is the same.

Selecting the cycle's scope. Before the cycle starts, you pick the suites (and cases) in priority order based on a quality risk assessment:

File functionality and robustness are the most customer-visible risks.
Table functionality historically surfaces serious bugs that take long to fix, run these early so fixes land in time.
Editing is the next-most-critical functional area.
Data loss on server crash is a current-release support headache, re-run the recovery tests.
Font and printing features are the next priority.
Performance has been partially covered in integration testing and is lower-risk this release.
Localization can ship after English is released, so it gets bumped to the following cycle.

Each case is assigned to a specific tester, with planned effort derived from the previous release's actuals. Performance tests (automated and running for 24 hours) exclusively hold the environment for a dedicated window.

Day one. The assigned tester starts with file functionality and discovers file creation is seriously broken: new files are corrupted. She writes the bug report, updates the case, and moves to the file-corruption recovery test. Mid-way through, the environment goes down unexpectedly, an infrastructure issue unrelated to the build. She loses an hour resolving the blocking issue, tests for another couple of hours, then closes the day. You roll up the results, review, and send a summary to your manager. The plan still holds; you continue.

Day two. The recovery testing is productive (six separate problems, six reports) and the simulated crash from day one (inadvertent, but real-world) exposed a serious problem she otherwise would have missed. She updates the case to capture the new condition. The system is buggier than expected, so the schedule is slipping. You add yourself as a tester for an afternoon and shift the performance window a day later. You drop printing tests for this cycle and reschedule for cycle two.

Mid-cycle. On day three everything goes to plan. On day four, the performance engineer starts server tests. Day five morning she finds them hung overnight, she didn't read the file-creation bug report and didn't adjust her scripts. She reworks, restarts, and finds performance-impacting bugs of her own. You drop the lowest-priority suite (platform-specific performance) to stay within the cycle window.

End game. Sunday morning you wrap the cycle, produce final cycle reports, and assign cases for cycle two, adding two hours per case to reflect the buggier-than-expected system. Printing and platform-specific performance (dropped from cycle one) go first, followed by localization. The release engineer enters the lab in the afternoon to prepare the next build.

That's the process running normally, adjusting to findings, blocked environments, surprise priorities, miscommunications, and schedule pressure, without losing track of state. The next section names the properties that let this kind of cycle run.

Six quality indicators for a good execution process

Finds the scary stuff first

The nastiest bugs are the ones that cost the most time to fix. The execution process has to lead with them, both within cycles and across the phase. That ordering comes from a quality risk analysis (formal or informal) that ranks potential failures in priority order. In the worked cycle above, global risks (file, edit) were run first; isolated or customer-segment-specific risks (printing, localization) came later.

Supports crisp communication of findings

Testing produces information the project management team uses to make quality decisions. The execution process has to capture those findings in clear reports and circulate them to the right people. In the worked cycle, the test manager pushed a nightly summary to her manager and the team, and the process partially broke when one tester didn't read another's report and wasted a shift of automated testing. The fix is lightweight: peer review of bug reports and status reports, and a published convention that every tester reads the running bug list at the start of their shift. Modern tools help here, Slack channels keyed to the release, triage bots, or daily auto-generated cycle summaries posted into the team channel.

Has measurable progress attributes

Two metrics matter in execution: bug detection rate and test case evaluation rate. For bug detection, a defect-removal model (even a simple one built from previous-release data) gives you an expected find curve; comparing actual against expected tells you whether the system is buggier, about right, or surprisingly clean. For test case completion, the project-management staples apply, milestones achieved vs. planned, effort expended vs. planned, and the combined view (are you hitting the milestones you expected given the effort you spent?). A cycle dashboard that shows these three lines is enough; most modern test-management tools (Xray, Zephyr, TestRail) have this view built in, and you can also roll one in Looker, Grafana, or a simple notebook if you want custom metrics.

Prevents accidental overlap

Two issues here: clear assignment, and no test-case dependencies.

Clear assignment means each tester knows exactly which cases she owns. Reassignments happen as cycles progress, and the test manager has to make sure new assignments circulate. This sounds trivial until you watch two testers test the same case for three hours each while a critical case sits unclaimed.

Test-case dependencies are usually avoidable and always undesirable. The ideal property is that any test case can be run by any tester on any day based on priority, environment availability, and skill. Dependencies introduce sequencing constraints that complicate planning for no real gain. The common trap: cases that set up data for subsequent cases, pull the fixtures into setup steps so cases are independent.

Adapts to evolving circumstances

Testing is the discovery of unexpected behavior, and researching anomalies to produce an actionable report. Since the behaviors are unexpected and the bug count is unpredictable, the process generates its own fluid state. On top of that, external events (delayed builds, scope adjustments, shifted priorities) add more change. You cannot prevent any of this. The process has to accommodate it: reshuffled assignments, revised effort estimates, dropped low-priority cases, extra resources brought in.

Captures data for continuous improvement

While the testware tests the system, the system tests the testware. A cycle is a source of data about the quality of your test cases, your test design, and your process itself, not just the system under test. In the worked cycle, a tester captured a new test condition on the fly (inspired by an infrastructure-caused outage); the test manager adjusted per-case execution times based on observed actuals. Good execution processes leave the testware better at the end of the cycle than at the start.

Six challenges every execution effort has to navigate

Balance progress and certitude

Researching a bug or chasing an unusual behavior takes unpredictable effort. The actual person-hours to complete a set of cases can deviate considerably from plan. As a manager you have to help the team balance forward progress through the cases against knowing, with acceptable confidence, what the findings mean. Not every bug deserves the same depth. An automated case that hangs once and runs cleanly on retry is probably not worth postponing other cases to investigate. A sporadic database-corruption bug is worth postponing most of the cycle to isolate.

Keep reports accurate, consistent, and timely

Test findings evolve constantly. If you spend two hours preparing slides for a two-hour project status meeting, your report is stale by the time the meeting ends. If your managers need precise counts of cases and bugs, you have to work harder on accuracy, which means your numbers are even more out of date. Get guidance from the people receiving the report. Once they understand the trade-off between how-fresh and how-precise, they'll usually pick the right balance for the situation. Modern real-time dashboards (Looker, Metabase, Grafana fed by your test-management tool's API) remove a lot of this friction, but they don't eliminate the underlying trade-off.

Ensure testers interpret results correctly

Test execution looks for mismatches between expected and observed behavior. But does that necessarily mean we're assessing the quality of the system? Sometimes. Our powers of observation and judgment are imperfect, our expectations of correct behavior are sometimes wrong, and inscrutable software can hide evidence of correct or incorrect behavior. Any of those three can produce a false pass or a false fail.

Three controls:

Peer review for observation and judgment issues. Have senior testers review junior testers' findings, and assign the same cases to different testers across cycles.
Unambiguous requirements for the oracle problem. Without a reliable oracle (a way to predict correct behavior) err on the side of reporting. Disputed bugs go to a cross-functional review (product, support, business), not the development team alone.
Escalate the inscrutability problem. Only senior management can fix "the software makes it hard to see what it's doing." Get the test team involved in design and development so the system is testable by construction.

Write good bug reports

Bug reports are the tangible product of test execution and the key communication channel to developers, peer organizations, and management. Their quality carries the test team's credibility. See the bug reporting process article and checklist for the ten-step process the team should use for every report.

Accept the right level of test-case ambiguity

A test case is ambiguous when two different runs could yield different results against the same software, or the same results against software that behaves differently in noticeable ways. Every test case is ambiguous. A case can read "spend four hours testing file operations" or dozens of detailed steps starting with "double-click the app icon and confirm it launches", or anything in between.

Test-case ambiguity is a spectrum, not a binary. Detailed cases take longer to write, require exact knowledge of how the system should behave, and restrict tester judgment. Vaguer cases are faster to write and adapt to new behavior but are harder for new testers to run consistently. Automated cases that check more data catch more bugs but produce more false positives and are more expensive to maintain. The right level depends on tester experience, time available for design, the maturity of your requirements, the stability of the system, and whether the case is scripted automation or an exploratory charter.

Modern practice blends scripted automation (unambiguous, stable, deterministic, high-volume) with exploratory test charters (bounded (e.g. "two hours testing file manipulation under network instability") but open-ended in approach). The combination is stronger than either alone.

Staying organized in crisis

Teams sometimes push back on the tracking overhead: "This seems like a lot of process. Does it work?" Yes. Applied with discipline it keeps the test operation organized and honest, no matter how chaotic the surrounding development process is. The tools and techniques in this paper are not silver bullets. They help the test manager choreograph a complex, fast-moving effort during what is normally a high-stress period in a project's lifecycle.

Execution war stories (anonymized)

A few cycle-level failures worth learning from, drawn from dozens of engagements, names removed.

Silent-area overlap. A retail web platform released a performance optimization. The test plan, divided up in an informal meeting the day before, covered the report generator but missed a second module that used the same view. The second module went to production broken, and because it was a key user-facing path, the site was effectively down for three hours on the second-busiest day of the week. Direct impact: ~$28K revenue, triple the usual support call volume, unquantified future business loss. A test matrix keyed to features (not just tickets) would have caught the assignment gap.

No subset selection. A test manager believed every release required a full regression pass. The submitted test plan showed 213 days of execution for a small release, far beyond what the organization could tolerate. Many of the cases were unrealizable in the available environment; no suite matrix tied the cases to the actual changes. The test plan was rejected and the test manager was dismissed, in part because the plan demonstrated no working notion of risk-based selection. Competitors shipped first; the company lost market share. Risk-based selection from a maintained case library (pairing each change with the cases that exercise it) would have produced a reasonable cycle window.

Blocked and abandoned. A telephony platform was divided into subsystems with distinct development teams and one shared QA group. The test team had blocking issues that needed cross-team resolution, but the business had already moved on to the next release. Test tracking consisted of a bug database where each tester was assigned a fixed slice of re-tests. There was no suite-level schedule tracking. When someone asked "are we on schedule?" the answer was "I haven't heard anything, so I think we're okay." The release shipped three weeks late, four weeks after marketing had committed to a date. Customers had already pre-ordered. The discount on undelivered pre-orders was expensive, but the real cost was the credibility loss: the whole organization learned that QA's status information wasn't reliable. A test-suite matrix and a daily cycle-level status report would have surfaced the schedule risk weeks earlier.

Implementing changes

If you've read this and decided to get your own execution process under control, here's a roadmap.

Assess where you are. Which parts of the execution process are not under control? How much of the chaos is external (release management, environment stability) vs. internal?
Put bug tracking in place, if you don't have it. A spreadsheet is better than email. A real tracker (Jira, Linear, GitHub Issues, ClickUp, or a test-management suite like Xray / Zephyr / TestRail that backs onto one) is better than a spreadsheet. The key property is a queryable state machine.
Define what you intend to test and how long it takes, in bite-sized pieces. Two to four hours per case or charter is manageable. Don't assume you need tightly scripted cases, a bounded charter ("spend two hours testing file manipulation under spotty network") can suffice. Size matters more than formality.
Put case-level tracking in place. A tracking spreadsheet or the cycle view in your test-management tool, either works. Track schedule state and findings side by side so you can see when findings and schedule diverge.
Schedule a daily team review of execution status. A fifteen-minute standup is enough if the tooling is good. The agenda is whatever you've put in place to track cases and bugs.
Figure out who needs daily status and in what form. Some managers want a verbal report; others want a written summary or a scheduled status meeting. Match the format to the audience. An auto-generated Slack summary from your test-management tool covers 80% of this need once set up.

What's changed, what hasn't

The seven-step process and six quality indicators remain accurate. What has changed is the tooling and the cadence.

Test-management tools. The current baseline is a test-management tool (Xray, Zephyr Scale, TestRail, qTest, Testiny) integrated with your issue tracker, test cases, suites, cycles, and executions live in one place, tied to the tickets they verify. Spreadsheets are still fine for small teams and small scopes, but the moment you have more than a few dozen active cases the maintenance overhead of spreadsheets outweighs their simplicity.

CI-integrated execution. Automated regression runs every pull request. Full regression runs nightly or on merges to main. The test manager's job is to own the risk-based selection of what runs when, the analysis of what the runs are telling you, and the design of the exploratory charters that fill the gaps automation doesn't reach. Execution is continuous; cycles still exist conceptually, but they often map to sprints, release trains, or deployment windows rather than calendar weeks.

Flake management is now a discipline. When automated regression is the primary signal, a flaky test suite is worse than no test suite. Flake triage (quarantine, root-cause, re-enable), re-run strategies, and flakiness metrics are now part of normal execution. If your green-run rate on main is below 95%, the team isn't running tests, they're guessing.

LLM-assisted triage. Modern test-management tools and CI platforms now routinely use LLMs for failure clustering (group similar failures across runs), root-cause hypothesis generation, bug-report drafting from run artifacts, and executive-summary generation from cycle data. Treat these as force multipliers, not replacements: LLM-drafted content always needs a human review pass.

Production telemetry as an input. Crash rates, error budget burn, user-reported issues, and release-by-release regression in key metrics (Sentry, Datadog, Crashlytics, Bugsnag, New Relic) now feed into cycle planning. A case that hasn't surfaced an issue in test in six releases but has surfaced telemetry regressions in production twice in the same period is a case that needs more attention.

Exploratory + scripted, not exploratory vs. scripted. The practitioner-level consensus has settled: the right execution mix is tightly scripted regression for the known-important paths, plus time-boxed exploratory charters for the unknowns. Charters get logged in the tool, findings attach to tickets, and the sum total of charter output is a meaningful contribution to the cycle, not a soft side activity.

Test Execution Process checklist, the printable one-pager.
Test Release Processes, what gets a build into the lab before execution starts.
Bug Reporting Processes, the process every bug report should follow.
Quality Risk Analysis, how you pick which cases run first.
Test Results Reporting Process checklist, what happens with the cycle's output.

Working on this?

Rex Black, Inc. has been coaching test teams on execution and cycle management since 1994. If you want help rebuilding your execution process, training your test leads, or rescuing a cycle that's off the rails, talk to us.

Test Execution Processes: Managing the Search for the Unexpected

Scope: what this paper is (and isn't) about

Definitions

The seven-step reference execution process

A worked cycle

Six quality indicators for a good execution process

Finds the scary stuff first

Supports crisp communication of findings

Has measurable progress attributes

Prevents accidental overlap

Adapts to evolving circumstances

Captures data for continuous improvement

Six challenges every execution effort has to navigate

Balance progress and certitude

Keep reports accurate, consistent, and timely

Ensure testers interpret results correctly

Write good bug reports

Accept the right level of test-case ambiguity

Staying organized in crisis

Execution war stories (anonymized)

Implementing changes

What's changed, what hasn't

Related reading

Evaluation Before Shipping: How to Test an AI Application Before It Hits Production

Choosing the Right Model (and Knowing When to Switch)

Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D

The ISTQB Advanced Level path, mapped

Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix

Building Quality In: What Engineering Organizations Do from Day One

Where this leads

Software Quality & Security

Risk Reduction & Clear Decisions

Reliable Software at Scale

Working on something like this?