Skip to main content
Talk · Rex Black, Inc.

Managing complex test environments. A logistics problem, not a spreadsheet problem.

Some bugs only reveal themselves in field-like conditions: server crashes under load, data corruption under concurrent write, regressions that hit only on a specific OS/driver/firmware stack, AI model drift that hits only on a specific dataset version. That means the test environment itself becomes complex enough to need its own system of record — hardware, software, configurations, people, locations, schedules, and the relationships between all of them. This talk is the framework for building and running that system of record, so your test coverage against complex systems stops being accidental.

Slides
30
Worked example
SpeedyWriter
Entities modeled
5
Format
Methodology talk

Abstract

Complex systems find different bugs than simple ones.

Some bugs are shy. They only reveal themselves during performance, stress, volume, data-quality, and reliability testing — and the symptoms, when they finally appear, are the expensive kind: server crashes, sudden performance cliffs, database wedges, silent data corruption, model drift, regressions specific to a driver version. Because software is non-linear, test results in less-complex settings rarely extrapolate. Finding these bugs requires running the tests in environments that look like production.

That requirement creates a second-order problem. A production-like test environment is usually an interconnected mesh: servers, clients, networks, storage, firmware, OS configurations, model endpoints, dataset versions, third-party services, lab locations, calibration equipment, and the people who have to operate all of it on a schedule. At that point you have stopped running tests and started running a logistics operation. Without a system of record, the team ends up guessing which hardware is where, who's running what, which build is on which box, and why today's run doesn't match yesterday's.

This talk walks through the framework. What to model. How to model it using entity-relationship thinking — a technique that predates Airtable, Terraform, and the modern inventory tools by decades and still underwrites all of them. What reports to build off the model once it exists. How to tie the model to the test schedule, to the location graph, and to the software-release plan so every test run is reproducible and explainable. The SpeedyWriter case study at the end shows the whole thing end-to-end with real entities, real cycles, and real configurations.

Test results in less-complex settings rarely extrapolate, because software is not linear. If your customer runs it in a mesh, you have to test it in a mesh.

Rex Black, Inc.

Outline

What the talk covers, in order.

01

Why complex test environments exist at all

Shy bugs. Performance issues at scale. Volume and capacity limits. Data quality failures. Reliability and MTBF regressions. Driver- or firmware-specific failures. Model drift on specific dataset slices. These are the bug classes that hide in trivial environments and surface catastrophically in production. Testing for them requires the test environment to have enough complexity to provoke them, which is the whole reason test-environment management becomes its own discipline.

02

What a production-like environment actually is

Today this is a bigger graph than most teams model honestly. For enterprise systems, it's still dozens of interconnected services across cloud regions, on-prem data centers, and third-party APIs. For embedded and IoT, it's hardware, firmware, radios, and calibration equipment in specific physical locations. For AI/ML, it's model versions, dataset versions, GPU SKUs, cache states, prompt-template revisions, and evaluator configurations. The common structure: nodes, relationships, states, versions, and schedules. Everything else is details.

03

What to track in the system of record

Seven things minimum. Hardware installation, locations, and relocations. Current, historical, and planned hardware configurations. Hardware interconnections and networking. Test locations. Test infrastructure (harnesses, simulators, stubs). Test engineer assignments and locations. Human resource deployment. Track these and you can answer the three questions every test program gets asked at 2 PM on release day: where is it, who has it, and what software is on it.

  • Hardware: installed, its location, its configuration history, its network wiring.
  • Software: every version running on every host — OS, firmware, applications, tooling, test scripts.
  • People: who works where, in which role, on which cycle, across which shifts.
  • Schedules: which test suite runs at which location on which build on which date.
04

Entity-relationship thinking, still the right frame

The original deck modeled this in Microsoft Access. Today the tool is Airtable, a purpose-built inventory DB, Terraform state plus a test-harness layer on top, or a JSON/YAML config in git — but the underlying technique is the same. Identify your entities (hardware, software, testers, tests, locations). List their properties. Pick the key properties that uniquely identify each entity. Identify relationships between entities (one-to-one, one-to-many, many-to-many). Note which relationships have their own properties. The E-R model is the conceptual backbone; whichever tool you pick stores it.

  • Entities: hardware, software, testers, tests, locations.
  • Key properties: serial numbers, build hashes, usernames, test IDs, site codes.
  • Relationships: tester-runs-test, hardware-is-at-location, software-installed-on-hardware, test-requires-hardware-configuration.
  • Relationship properties: date, cycle, pass/fail, time-on-task.
05

Budgeting and planning against the model

Once the entities are in place you can plan off them instead of off guesses. Testers and tests: maintain the test suite inventory and estimated durations, then assign testers to runs. Hardware and infrastructure: know the capacity and the reservation schedule. Locations and lab space: testers work somewhere, hardware is installed somewhere, and the relationship between the two is a scheduling constraint. The planning math — who's running what, on which hardware, at which site, on which build — is a join across the model, not a standalone spreadsheet.

06

Reports, not tables — the management surface

Raw tables are hard to read and harder to cross-reference. The useful artifact is the report: multi-table joins organized for a reader, not for a DBA. Reports by tester ("what am I running this week"), reports by test ("where and when is this suite executing"), reports by location ("what's in this lab, for how long"), reports by cycle ("what's covered, what's not"), and summary rollups across phases. These are the artifacts management actually reads; the model exists to make them cheap to produce.

  • Per-tester assignment reports — the weekly plan.
  • Per-test schedule reports — the execution window.
  • Per-location reports — the capacity picture.
  • Per-cycle summary rollups — the coverage picture.
07

Locations matter more than teams admit

Equipment is installed in locations — test labs, computer rooms, colocations (like telephony servers at telco central offices, or IoT devices at customer sites). People work at locations too — labs, cubicles, home offices, customer sites, on the road. In a complex program the location graph changes over time: equipment relocates, labs reconfigure, people move. A test environment model that doesn't track locations over time cannot explain why a test ran differently this cycle than last. Capture location, and capture dates on location.

08

Software release tracking inside the model

Software is not one thing — it's a stack. The model has to hold BIOS/firmware versions, operating system versions, applications, virtual machines and interpreters, utilities, and test tools and scripts. The release plan ties specific revisions to specific cycles on specific platforms at specific dates. When a regression appears, the question "what changed in the stack between cycle N-1 and cycle N on this platform" has to be answerable from the model in a single query, not reconstructed from memory.

09

The SpeedyWriter worked example — what the model looks like end-to-end

The case study ships SpeedyWriter across five platforms (Mac, Win95, Win98, WinNT, Solaris) through three phases (Component, Integration, System) with three cycles each. That's a 5 × 3 × 3 lattice — forty-five distinct (platform, phase, cycle) slots, each with its own build revision, its own assigned tester, its own hardware allocation, and its own date. The model catalogs every slot; the reports slice it by tester, by platform, by phase, or by date. Replace the 1999 platforms with today's (four cloud regions × three runtime versions × three model revisions, say, or three firmware builds × two radio stacks × four physical labs) and the structure holds.

  • Revision identifier grammar: "C.1.Mac" = Component, Cycle 1, Mac build. A single naming convention makes the model legible.
  • Release dates are on the revision, not on the plan.
  • Tested configurations are a query, not a report — the same model answers them on demand.
10

What goes wrong without the system of record

Three failure modes, all common. Defocusing "materiel misunderstandings" — two teams assume two different labs are the same lab, or two different firmware versions are the same version, and the test fails or succeeds for reasons no one can reconstruct. Missing coverage against critical customer-impacting bugs — the environment the customer hits wasn't in the matrix, because no one was counting matrices. Bug irreproducibility — the build that triggered the bug can't be re-created because no one captured the stack at the moment of failure. All three go away when the environment is modeled instead of remembered.

Key takeaways

Four things to remember.

01

Treat the test environment as data.

If your environment state lives in heads, shared folders, and hallway conversations, your complex-system testing is guesswork. Model the entities and the relationships explicitly — in whatever tool fits the team.

02

Entities and relationships, not tools.

Airtable, Notion, a bespoke DB, Terraform state with a harness layer, or JSON in git — all fine. The entity-relationship model beneath them is what makes any of them pay off. Pick the tool that lets your team author the model without a DBA.

03

Reports are the product. The model is infrastructure.

The per-tester weekly plan, the per-cycle coverage rollup, the tested-configurations matrix — these are what management consumes. The entity model exists to generate them cheaply.

04

Dates and locations are first-class.

Equipment moves. Labs reconfigure. People change roles. Builds ship. Track everything with a date and a location, and regression triage stops being forensic work.

Worked examples

One bug. Eight drafts.

Five entities, modeled once, reported many ways.

The core entity set from the SpeedyWriter worked example, restated for a modern program. The entities are the same whether the test target is enterprise software, an embedded device, or an AI/ML system; the properties and relationships change per domain.

Hardware

Key: serial or asset ID.

Properties: type, spec, state (in-service / in-repair / retired), location, install-date, owner.

Relationships: installed at Location; runs Software; allocated to Test.

Software

Key: revision identifier (e.g. C.1.Mac or v4.2.1-gpu-a100).

Properties: phase, cycle, platform, build-date, artifact URL, checksum.

Relationships: installed on Hardware for a Cycle; under-test by Tester.

Tester / Test / Location

Tester key: username or employee ID; properties include role, shift, home location.

Test key: test ID; properties include suite, estimated duration, hardware requirement.

Location key: site code; properties include type (lab, datacenter, customer site, remote), capacity, timezone.

Relationships: Tester runs Test at Location on Date against Software on Hardware — the join every report in this domain is built on.

What reports fall out

Per-tester weekly plan (tests × days).

Per-cycle coverage matrix (platform × phase × cycle).

Per-location capacity view (hardware × assignment × date).

Per-build history (software × platform × tester × pass/fail).

Environment-drift diff (what changed in the stack between cycle N-1 and cycle N on this platform).

Closing

The deck's original tooling choice — Microsoft Access — is not the point, and never was. The point is that the test environment is a graph, and graphs need a system of record. Today that system looks different depending on the org: it's a shared Airtable base for a 20-person QA team, a purpose-built internal app at an enterprise, Terraform state plus a custom test-harness layer at a cloud-first company, a YAML/JSON block in a git repo for a small infra team.

Whatever the tool, the work is the same: name the entities, pick the keys, model the relationships, write the reports, and keep the model honest as the program evolves. Teams that invest in this piece of infrastructure spend less time in release-day triage and catch more of the shy bugs the environment was built to find.

More for this audience

Articles, guides, and case studies tagged for the same readers.

Want this talk delivered in-house?

Rex Black, Inc. delivers every talk on this site as a live workshop, a keynote, or a conference session. Tailored to your stack, your team, and your timeline.