Whitepaper · Third-Party & Integration Risk
Almost every system you ship today is an integration of things you didn't write — SaaS platforms, open-source libraries, LLM APIs, payment providers, auth providers, cloud primitives. Each of those components can quietly torpedo quality. This article names the four factors that make a component dangerous and the four strategies you can use to manage the risk.
Read time: ~11 minutes. Written for engineering managers, architects, and program owners responsible for systems that mix custom code with vendor and open-source components.
Everyone is an integrator now
Today it is almost impossible to build a non-trivial system without integrating code you didn't write. A typical new application of any size pulls in:
- SaaS platforms — CRM, billing, analytics, customer data, feature flags, error tracking, observability.
- Infrastructure services — cloud compute, object storage, managed databases, CDN, edge compute, managed Kubernetes.
- Identity and payments — auth providers, KYC vendors, payment processors, fraud scoring.
- AI and data APIs — foundation-model APIs, embeddings services, speech/vision models, retrieval services.
- Open-source packages — typically hundreds to thousands of transitive dependencies per service.
- Outsourced custom development — offshore or onshore partners building pieces of the system to spec.
Each of these is, from a quality perspective, an outsourced component. Some people call only the last one "outsourcing," but the same risk shape applies to all of them: another party made quality-and-testing decisions on your behalf, and you have to live with the consequences.
It's easy for a project manager to assume that using vendors reduces overall risk — the vendor is the expert, the component is "done." In practice, every integrated component carries its own quality risks, and some of them are worse than anything you'd have created in-house. This article covers the four factors that determine how bad those risks are, and the four strategies that work for managing them.
The methodology is deliberately component-agnostic. It applies equally well to a decades-old commercial database, an open-source charting library, an AI vendor's model API, or an offshore team's custom code.
A worked example
To make the discussion concrete, picture a midsize fintech building a lending application. The program manager is integrating three non-trivial components:
- A commercial SaaS identity / KYC platform that verifies applicant identity and runs compliance screening.
- An LLM API from a foundation-model vendor that summarizes application narratives and flags anomalies for underwriters to review.
- An outsourced custom development team building the application's lending-specific business logic and workflow engine.
Each of these is an integrated component. Each carries its own risks. And each calls for different strategies, as we'll see.
Four factors that increase quality risk
Four characteristics of an integrated component increase the risk it poses to your system. Working through each:
1. Coupling
Coupling is the degree to which a failure in the component propagates to the rest of the system. A highly coupled component, when it fails, brings down many things. A loosely coupled component, when it fails, brings down only itself.
In the example: the identity / KYC platform is highly coupled — without it, applications cannot be submitted at all. The LLM API is loosely coupled if its output is advisory (underwriters can still make decisions without it) and highly coupled if the workflow routes applications based on its score.
Questions to ask:
- What fails if this component returns an error? Slows down? Starts returning subtly wrong results?
- What fails if this component is unavailable for a minute? An hour? A day?
- Can we degrade gracefully, or does the whole system block?
Engineering practices that reduce coupling — circuit breakers, timeouts, retries with backoff, fallbacks, feature-flag shutoffs — don't eliminate the component's quality risk but they cap its blast radius.
2. Irreplaceability
Irreplaceability is the cost and lead time required to swap the component out if it turns out not to work. A replaceable component is one for which equivalents exist and switching is cheap. An irreplaceable component, once chosen, has to be made to work.
In the example: the commercial identity / KYC platform might be replaceable — other vendors exist — but integration takes weeks and their terms, APIs, and data models differ. The custom-developed lending workflow is highly irreplaceable once work is underway: you've paid for it, it's bespoke, and writing a second one is a new project. The LLM API is partially replaceable — model vendors have converged on broadly similar APIs, and model-routing layers make swapping the underlying model tractable — but the prompts, guardrails, and evaluation sets will need re-calibration.
Questions to ask:
- How many viable alternatives exist for this component today?
- What would it cost to switch — in engineering effort, in retraining, in contract exit fees?
- Is the component using standard interfaces, or has it shaped your system in its own image?
Non-standard extensions to a standard component are a common way teams accidentally make themselves dependent on a vendor: the database is nominally replaceable, but the stored procedures aren't.
3. Essentiality
Essentiality is whether the component's function is load-bearing for the product. A non-essential component can be dropped if it doesn't work. An essential one can't.
In the example: the LLM-based narrative summary for underwriters may be non-essential — if it doesn't work well, underwriters read the original narrative, which is mildly slower but not mission-critical. The identity / KYC check is essential — compliance rules prohibit shipping without it. The custom workflow engine is essential — it is the product.
Questions to ask:
- If this component didn't exist at all, would the product still work? Work differently? Not exist?
- Is this component the "nice-to-have" that marketing is pitching, or the load-bearing wall?
- If this component is unavailable, what do we tell users, and can they still accomplish anything?
Essentiality and coupling are related but distinct. A sidebar widget can be highly coupled (its crash brings down the page's render) and entirely non-essential (nothing in the business depends on it).
4. Vendor quality (and responsiveness)
The last factor is how good the vendor's own engineering, testing, and support actually are. A reputable vendor with a proven track record and fast turnaround on defect reports is a low-risk partner. A new entrant, an unfamiliar open-source project, or an offshore development firm with no track record is a higher-risk one, particularly when combined with slow or absent technical support.
In the example: the commercial identity / KYC platform may be a reputable industry name with mature SLAs — relatively low vendor-quality risk. The LLM vendor may have a strong record on the core API but still occasionally change behavior in ways that break downstream prompts — call that medium vendor-quality risk, especially around drift. The outsourced custom development team's vendor-quality risk is the big unknown: if they have worked on similar products before, with references you can actually talk to, risk is moderate. If they haven't, it's high.
Questions to ask:
- What is their track record on products and teams like ours?
- Can we talk to real customers who have shipped with them?
- When we report a defect, how quickly do we get a response, and how quickly does it ship as a fix?
- For AI vendors specifically: how much notice do we get about model updates that could change behavior?
Putting the four factors together
The four factors compound. A component that is highly coupled, irreplaceable, essential, and built by an unproven vendor is the worst case — a failure anywhere in that chain is a crisis with no graceful exit. The risk-management strategies below should be applied most aggressively there.
Conversely, a component that is loosely coupled, replaceable, non-essential, and built by a reputable vendor is practically risk-free and can be handled with the lightest possible diligence.
Most components fall somewhere in between. The factors give you a vocabulary for where.
Four strategies for managing the risk
Given a component with meaningful risk, four strategies cover the practical options.
1. Trust your vendor
Accept the vendor's component quality and testing. Allocate schedule and financial contingency for the possibility they got it wrong. Build fallbacks and graceful-degradation paths for the cases where coupling and essentiality would otherwise bite you.
This sounds naïve expressed that way, but most project teams do it for at least some components. The trick is to do it with eyes open — understand which components you are trusting, and have a contingency plan ready for each.
In the example: trusting the commercial identity / KYC platform is reasonable — they're regulated, mature, SLA-bound. Trusting the outsourced custom development team without any verification is a bet you probably shouldn't take.
If "trust" is your strategy, an acceptance test at delivery time is functionally equivalent. It's the only defense you have, and for custom-developed components you can't run it until late in the project. If the component fails acceptance, your options are ugly — contract renegotiation, lawsuit, starting over — all expensive, all disruptive.
2. Manage your vendor's testing
Integrate and manage the vendor's testing as part of your own, distributed test program. Require visibility into what they're testing, how, and with what results. This only works when you have enough leverage to insist on it.
In the example: this strategy has a reasonable shot with the outsourced custom development team, especially if you are a big enough customer that they're motivated to retain your business. A small development firm may actively welcome the oversight — they learn from your processes. The commercial SaaS identity platform will likely refuse outright: you are a small part of their customer base, and they have their own QA function, product roadmap, and release cadence.
For COTS and SaaS vendors specifically, expanding their testing under your oversight opens the door to open-ended requirements drift on their side. Smart COTS vendors refuse this as a matter of policy, or treat it as a paid customization engagement on time-and-materials.
3. Fix your vendor's quality and testing
Go in and revamp the vendor's test processes, or build them new ones. This assumes the vendor accepts the assessment, is willing to be changed, and that you have the clout to require it.
This is a big commitment and only makes sense when (a) the component is essential and hard to replace, (b) the vendor's capability gap is fixable within your project's scope, and (c) continuing the relationship is the right call for other reasons. Done well, it produces a better component and a better long-term vendor. Done badly, it becomes a second project inside your project.
In the example: this might work for the outsourced custom development team if an assessment reveals specific, addressable weaknesses. It is almost never a realistic option for a commercial SaaS vendor — if they'd agree to let you restructure their testing, that's a signal you're looking at a prototype, not a product.
4. Test your vendor's component yourself
Assume the vendor's testing is insufficient and retest the component yourself. Allocate budget for the retest. Deal with the fact that most vendors will push back on every bug report as either "expected behavior" or "change request."
In the example: if early deliveries from the outsourced team reveal systemic quality problems, retesting is the realistic fallback. You might still continue with that vendor because contract costs and schedule realities preclude switching, but the test burden now includes the work they should have done. For the LLM API, continuous output evaluation (eval suites, drift detectors, A/B comparison across model versions) is testing your vendor's component yourself — and it is rapidly becoming table stakes for any production LLM integration.
Which strategy, when?
The strategies are not mutually exclusive; most real programs apply different strategies to different components in the same system. A useful decision heuristic:
| Vendor type | Coupling | Essentiality | Starting strategy |
|---|---|---|---|
| Reputable COTS / SaaS with SLA | Low–Med | Non-essential | Trust + acceptance test at integration |
| Reputable COTS / SaaS with SLA | High | Essential | Trust + extensive integration testing + contingency plan + fallback mechanism |
| LLM / AI API | Any | Any | Continuous eval suite + drift detection (always Strategy 4 at some level) |
| Open-source library | Any | Any | Trust + pinned versions + supply-chain scanning + your own integration tests |
| Outsourced custom development | Any | Any | Start with Manage; escalate to Fix or Test based on early signals |
The heuristic is a starting point, not a recipe. The factors discussed above — and the specific context of the project — ultimately decide.
Political and contractual implications
All four strategies have political implications. If quality problems emerge mid-project, the vendor is unlikely to accept your assertion that their testing is incompetent or their quality is unacceptable. They may attack your team's credibility instead. If a senior executive at your company was the one who made the vendor-selection call — and it was probably an expensive one — there is a non-zero chance that executive sides with the vendor against your team.
Bring data to those conversations. Defect rates, test coverage of the vendor's work, bug-reproducibility evidence, comparison to SLA thresholds. Vague assertions of "bad quality" lose; specific, quantified claims win.
Even better: influence the contract before the project starts.
- For custom-developed components: require the vendor to provide their test plan, test results, and evidence of defect-density trends before payment. Require that your acceptance testing be a payment gate, not a post-payment formality.
- For SaaS / API vendors: negotiate SLAs that include measurable quality dimensions (uptime, error rate, latency, accuracy for AI services where relevant). Get model-update notice periods in writing for AI vendors. Understand deprecation policies.
- For open-source dependencies: establish a supply-chain scanning pipeline (SBOM, CVE tracking, license compliance) and a policy for what happens when a critical dependency goes unmaintained.
Make it clear to the vendor that payment depends on quality delivery. It is remarkable how motivational that turns out to be.
Don't skip your own integration and system testing
Even with every component-level risk managed perfectly, the integrated whole can still fail in ways no individual component would. A correctly-behaving identity platform + a correctly-behaving LLM API + a correctly-behaving workflow engine can still produce a broken lending flow because the three never meet each other the way your users will.
The corollary to every strategy above: budget for your own integration testing and system testing, targeting the seams between components and the end-to-end workflows that matter to users. The fact that each component is "good" doesn't mean the composition is. It often isn't.
Takeaways
- Every modern system is an integration. The risk shape is the same whether the component is a COTS database, a SaaS identity platform, an LLM API, an open-source package, or an offshore custom build.
- Four factors determine how much risk a component carries: coupling, irreplaceability, essentiality, and vendor quality. The factors compound — the worst case is a component that is high on all four.
- Four strategies cover the realistic options: trust, manage, fix, test. Most real programs apply different strategies to different components within the same system.
- Influence the contract before the project starts. Data-driven conversations about quality land; vague ones don't.
- Regardless of what the vendors did, always plan for your own integration testing and system testing. The whole can fail in ways no individual component did.
Further reading
- Flagship whitepaper: Quality Risk Analysis — the five-technique taxonomy and seven-step process that the risk assessment here plugs into.
- Checklist: Quality Risk Analysis Process — a printable working checklist for running the analysis with stakeholders.
- Case study: A Risk-Based Testing Pilot: Six Phases, One Worked Example — how the risk framework plays out on a disciplined pilot.
- Talk: Managing Complex Test Environments — the logistics layer of integration-heavy test programs.