Part 4 of 6 · Investing in Software Testing
Testing the right things isn't enough — you also have to test them the right way. Static, structural, and behavioral techniques each catch different classes of defect, and a program that leans on one at the expense of the others leaves money on the table.
Read time: ~8 minutes. Written for engineering leads, QA managers, and developers who own or influence the test strategy.
Why technique matters as much as target
Part 3 covered what to test — how to rank quality risks so budget goes to the areas that matter most. This article covers how to test — because the right target hit with the wrong technique is still a miss.
There are three foundational families of test technique. Each catches a different class of defect, each has different upfront and per-execution costs, and each requires different skills to run. A program that invests in all three performs materially better than one that doesn't. Where it's locked to just one — often behavioral, sometimes structural alone — it's leaving defects (and return) on the table.
Static testing — find defects before the code runs
A static test evaluates quality without actually executing the system. It looks at artifacts — requirements, designs, code, configuration — for defects, inconsistencies, and gaps.
Examples of static testing:
- Desk-checking your own code after you write it.
- Requirements, design, or code reviews (walkthroughs, inspections, Fagan reviews).
- Static analyzers (linters, type systems, SAST, dependency scanners).
- Architecture decision record (ADR) reviews.
- Schema and contract reviews on API and data designs.
- LLM-assisted review that flags likely bugs, security issues, or policy violations from diff context.
The economic argument for static testing is overwhelming. A defect caught in requirements review is prevented from ever being coded, built, deployed, reproduced, logged, triaged, or retested. Requirements reviews in published studies have shown returns as high as 800% on the review time invested, because the overhead avoided by not coding the wrong thing is enormous.
Static testing works only if it happens at the right time. Reviewing the requirements after the code is written recovers almost no value — the cheap bugs are already expensive. Review as soon as the artifact exists, while the author's assumptions are still fresh and no downstream work has been built on top of the defect.
Effective review also depends on the right people attending with ground rules. Domain experts must attend requirements reviews. Architects must attend design reviews. Senior engineers must attend code reviews. Testers add real value in all three — they're pattern-matchers who spot inconsistencies, vagueness, and missing cases — but must bring enough domain and technical literacy to contribute.
Structural ("white-box") testing — verify the system from the inside
Structural tests are dynamic (they run the system) and inside-out (they derive tests from knowledge of the implementation). Some people call them white-box or glass-box tests.
The typical targets are components and interfaces. Structural tests excel at finding localized errors — control-flow mistakes, data-flow bugs, boundary errors, error-handling paths that never fire.
The most valuable pattern in this family is a reusable automated test harness checked into source control alongside the component it tests. Developers author the tests as they code. A build system runs the harness on every change. The regression suite expands monotonically without proportional labor cost.
Structural test authoring is most efficient when developers do it — they have the inside knowledge the technique requires. The test function adds leverage by:
- Building and maintaining the harness itself.
- Teaching technique (boundary analysis, condition coverage, basis path testing, equivalence partitioning).
- Contributing to interface and integration tests that cross component boundaries.
- Setting coverage standards and gate thresholds.
Structural testing doesn't automatically mean "unit testing" — the same mindset applies to service-to-service contract tests, message queue adapter tests, and database schema migration tests. Any test that derives from implementation knowledge and executes the system in isolation qualifies.
Behavioral ("black-box") testing — validate what the system does
Behavioral tests are dynamic and outside-in — they describe what the system does, not how it does it. They focus on workflows, configurations, integrations, performance, and the full end-to-end user experience.
Behavioral testing is typically where test professionals specialize because:
- It aligns with customer-observable quality (see Part 2 on fidelity).
- It requires disciplined design — exploratory methods, scripted tests, scenario decomposition.
- It supports the program's reporting obligations — behavioral failures are the ones leadership most directly understands.
Behavioral testing tools include performance and load generators, functional automation platforms (Playwright, Cypress, Appium, and their enterprise siblings), contract testing frameworks, API test platforms, and observability tools for production probing. Tools are classified as intrusive (running on the system under test) or non-intrusive (running externally and interacting with the system the way a real user would). Non-intrusive is strongly preferred wherever feasible — it measures reality, not an instrumented surrogate of reality.
Manual behavioral testing remains important, especially for:
- Exploratory testing (Bach, Hendrickson, Kaner — the charter-based forms).
- Usability assessment.
- Localization review.
- Documentation and help verification.
- Live or acceptance testing — subjecting the system to real data, real user workflows, and real configurations. Beta programs, customer guest-testing arrangements, and acceptance environments are all live-testing patterns.
Skilled manual behavioral testers know how to follow a defect trail and blaze new ones. They blend scripted and exploratory modes, applying on-the-spot judgment that automation lacks. A program that abandons manual behavioral testing in pursuit of 100% automation typically discovers — too late — that it's lost the ability to catch a whole class of defect that only a human notices.
Sharing across techniques
The three families are separate in exposition but not in practice. Cross-pollination is where mature programs pull ahead:
- Behavioral teams can reuse structural test data and fixtures.
- Developers can reuse behavioral automation suites for their own regression runs.
- Test drivers written for unit tests can become load generators for performance testing.
- Harness infrastructure (containers, test accounts, fixtures) is shared across teams.
Test techniques are tools, not religions. A program that treats them as tools produces better outcomes than one that treats them as identity.
A heuristic for blending the techniques
There's no universal ratio, but a useful starting point:
- Static testing should cover 100% of artifacts — every requirement, design, and commit should pass through some level of static check.
- Structural testing should cover every component at meaningful depth for the risk level (70%+ coverage for high-risk code, lighter for scaffolding).
- Behavioral testing should cover every critical user workflow plus the integration surfaces where components meet, with depth proportional to quality risk priority.
If you can only afford to invest in one additional discipline right now, static testing almost always wins on pure ROI grounds — it's the cheapest prevention mechanism available, and the returns are immediate.
What comes next
Knowing which technique family to apply is the first decision. The second — applying specifically within structural and behavioral testing — is whether to run a given test manually or automate it. Part 5 covers that decision, including the cost-benefit math that separates sensible automation from expensive failures.
Related resources
- Part 3 — The Risks to System Quality — prioritization feeds technique choice.
- Part 5 — Manual or Automated? — the automation decision.
- Charting Defect Data — how to measure whether your techniques are actually finding bugs.
- Test Estimation Process — sizing a program that balances technique investment.