Prove your agents
before production does.
Boards and customers remember what broke in the wild. Strong agent releases need someone who owns what “works” means, how you prove it, and what happens when the model or prompt changes next week.
32+ years of enterprise QA, applied to agents and LLM-backed systems. We do not expect you to become a tester first. The Test practice is the lane where that shows up: clear criteria, evidence you can show, regression discipline, training when your people need a shared language, and embedded testers when release risk is the job. Same firm as your strategy and build work; this is where quality becomes defensible.
Years in enterprise QA
ASTQB-accredited providers in the US
Testers certified worldwide
Everyone is shipping agents. Few can prove they hold up. Strong releases need clear criteria, evidence, and owners who stay on it when models and prompts change.
We help you make releases defensible on your timeline. Implementation and integration are core to Rex Black; so is QA. The Test practice brings 32+ years of judgment to agents and LLM-backed systems: plain-language risk, criteria you can show a buyer, and the training and maturity work when your org needs structure, plus embedded testers when you need coverage, not just more headcount.
- Agents in production before anyone wrote down what “good” looks like
- Leadership asks “how do we know it's safe?” and nobody can show the receipts
- Buyers or auditors want a process, not heroics. When they look, the story falls apart
- Every model or prompt change is a new risk, with no regression story to catch what broke
We focus on what holds up in production: evidence you can show, maturity when you need structure, and work that matters instead of checkbox theater. More than 3 decades of enterprise QA inform how we do it.
Capabilities.
(4)Agents and AI products
You should be able to answer “what happens if this fails?” without improvising. 32+ years of QA judgment applied to agents: criteria, cases, evidence, regression, so LLM-backed systems meet the same bar as any release that can hurt you in public.
Embedded QA
ISTQB and CT-AI
TMMi and org maturity
Talk to us about
You need the right outcome, not the right QA vocabulary. Pick a door; we'll translate your situation into scope on the call. Same team, same bar.
Stop guessing if your agent is “good enough.” Criteria, runs, evidence, so you can stand behind the release.
Plain-English read on where testing breaks down and what to fix first, including formal paths when you need recognition.
Give your team one shared language (and AI-testing depth when you're shipping ML). Accredited programs.
Senior testers in your rhythm (cases, execution, automation) when headcount isn't the same as coverage.
When quality is already costing you
No QA degree required. These are the patterns we see when teams feel the squeeze before the headline.
It worked in the demo, not after deploy
Customers hit edge cases your team never scripted. Without a pass/fail story, every firefight looks like bad luck instead of a gap you can fix.
Releases keep breaking
Trust drops inside the org and with buyers. Audits get harder. Every hotfix trains the team to expect the next one.
Nobody can explain “how we know it's safe”
Leadership asks for confidence; engineering has opinions, not evidence. Sales and support get stuck defending what nobody measured.
Compliance or enterprise buyers want proof
Traceability, records, a grown-up process. Gaps show up when someone serious actually looks.
You can't scale testing by hiring alone
Headcount without structure means heroics and burnout, not a system that survives the next model update.
You shouldn't need to decode QA jargon to buy the work. Three phases: what we check, what we put in place, what we keep improving, so expectations stay obvious.
Assess. Build. Optimize.
Assess
2 to 3 weeks- +TMMi-aligned maturity and quality diagnostics
- +Process and systems map
- +Gaps, risks, prioritized report
Build
6 to 12 weeks- +ISTQB training (Foundation through Advanced; CT-AI where you test AI-based systems)
- +Test cases and embedded manual QA
- +Automation direction and tooling fit
- +Documentation for compliance when needed
Optimize
Ongoing- +Deeper certs and specializations
- +Agent and LLM evaluation with releases
- +Performance and load where needed
- +Governance and audit readiness
Also in scope
- Performance and load testing
- Compliance-oriented programs (SOC2, HIPAA, FDA-aligned where relevant)
- Automation strategy, CI integration, maintainable suites
- Process and governance redesign
“We've built quality programs for organizations where failure doesn't create a support ticket. It creates a headline.”
Start with an assessment if that fits.
Roughly two to three weeks. Plain-language gaps and priorities. No obligation to keep us on.
More in this area
Articles, talks, guides, case studies, and reference artifacts that show up on the same kinds of engagements.
- Whitepaper
Evaluation Before Shipping: How to Test an AI Application Before It Hits Production
The release-gate playbook for AI features. Covers the five evaluation dimensions, how to build a lean golden set, where LLM-as-judge is trustworthy and where it lies, rollout mechanics with named exit criteria, and the regression suite that keeps a shipped AI feature from quietly rotting in production.
Read → - Whitepaper
Choosing the Right Model (and Knowing When to Switch)
A practical framework for matching LLM model tier to task. Covers the four axes (capability, latency, cost, reliability), cascade routing patterns that cut cost 60 to 80 percent without measurable quality loss, switching costs you did not plan for, and the worked economics at 10K, 100K, and 1M decisions per day.
Read → - Whitepaper
Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D
Most engineering L&D programs over-index on a single certification family, usually ISTQB on the QA side, AWS on the infrastructure side, and under-invest across the rest of the technical domains the org actually needs. This paper covers a multi-domain certification roadmap (QA, AI, cloud, data, security, project management, software engineering) with sequencing logic for each level of the engineering ladder, plus the maintenance discipline that keeps the roadmap relevant as the technology shifts underneath it.
Read → - Guide
The ISTQB Advanced Level path, mapped
The Advanced Level landscape keeps changing — CTAL-TA v4.0 shipped May 2025, CTAL-TM is on v3.0, CTAL-TAE is on v2.0. This guide maps all four core modules, prerequisites, exam formats, sunset dates, and which module a given role should take first. Links directly to the authoritative istqb.org syllabi.
Read → - Whitepaper
Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix
Bug triage is the cross-functional decision process that converts raw defect reports into prioritized action. Done well, it optimizes limited engineering capacity against risk; done poorly, it becomes a backlog-management ritual that neither fixes the important defects nor drops the unimportant ones. This whitepaper covers the triage process, the participants, the six action outcomes, the four decision factors, and the governance disciplines that keep triage effective in continuous-delivery environments.
Read → - Whitepaper
Building Quality In: What Engineering Organizations Do from Day One
Testing at the end builds confidence, but the most efficient quality assurance is building the system the right way from day one. This whitepaper covers the upstream disciplines — requirements clarity, lifecycle selection, per-unit programmer practices, and continuous integration — that make system-level testing cheap and fast rather than the only thing holding a release together.
Read →
Where this leads
Services and products that typically come next.
- Service · Quality engineering
Software Quality & Security
Independent test programs, security testing, and quality engineering for systems where defects cost real money.
Learn more → - Solution
Risk Reduction & Clear Decisions
Quality programs and decision frameworks that shift risk discussions from anecdote to evidence.
Learn more → - Solution
Reliable Software at Scale
Quality engineering programs for organizations whose software is now operationally critical.
Learn more → - Government
Government & Defense
ISTQB-based test programs and security testing for federal, defense, and public-sector software.
Learn more →