// FRAMING
Zero-fee is an architectural property, not a pricing trick
The phrase “zero-fee orchestration” is easy to misread as marketing. It isn't — at least not when the architecture is built honestly. It's a structural property that falls out of one specific design decision: the orchestration layer and the acquirer relationships are owned by the same operator, so the orchestration cost is internal to that operator's P&L instead of externalized to the merchant as a separate fee.
But the architecture has to actually deliver on that. Stapling a routing layer onto someone else's acquirer relationships gets you the marketing claim without the performance — you still pay the middleman cost in latency, in surface area, and in failure modes, even if you've hidden it in the rate card.
This article walks through what a real zero-fee architecture looks like at the transport, vault, and settlement layers. The companion positioning article covers the economics. This one covers how the engineering actually works.
// THE PROBLEM
What the middleman routing hop costs you
In a pure-play orchestration setup, every transaction traverses two cooperating networks: the merchant's application sends the payment to the orchestration platform, which then proxies it to whichever downstream acquirer the routing rules selected. Both networks must be online, both must respond within the merchant's timeout budget, and both must reconcile what they think happened against the other's ledger.
Concretely, the middleman hop costs you in four ways:
- Latency.The extra network hop between the orchestrator and the acquirer typically adds 80–150ms per transaction, depending on TCP RTT between the two data centers and TLS handshake reuse. On a single transaction it's invisible; across a recurring batch of 50,000 it's minutes of cumulative authorization delay.
- Surface area.Two systems to authenticate against, two API contracts to version, two webhook schemas to handle. Failure modes double — the orchestrator can be up while the acquirer is degraded, or the acquirer healthy while the orchestrator is rate-limiting you. Diagnosing which side broke takes longer when both are someone else's.
- Reconciliation drift.Two ledgers means two settlement records that have to agree. Most days they do; the days they don't are where finance teams lose half a week to manual reconciliation. The drift usually isn't fraud — it's the orchestrator's and the acquirer's lifecycle events firing in different orders or with different timestamps.
- Failover blast radius.When the orchestration platform itself has a degraded incident, every acquirer behind it is effectively offline from the merchant's point of view — even the ones processing normally. A native multi-acquirer setup survives any single acquirer going down; a pure-play orchestrator in front of those acquirers introduces a new single point of failure above them.
// THE TOPOLOGY
What a unified payment stack looks like
The unified architecture eliminates the middleman by collapsing the orchestration layer into the same operator that owns the acquirer relationships. From the merchant's perspective:
Merchant application
│
▼
┌──────────────────────────────────────┐
│ Single SDK / one API contract │
│ - Tokenization │
│ - Payment intent lifecycle │
│ - Webhook surface │
└─────────────┬────────────────────────┘
│ (TLS, signed, idempotent)
▼
┌──────────────────────────────────────┐
│ Operator's payment stack │
│ ┌────────────────────────────────┐ │
│ │ Network-token vault │ │
│ │ (VTS + MDES, valid across │ │
│ │ all downstream acquirers) │ │
│ └────────────────────────────────┘ │
│ ┌────────────────────────────────┐ │
│ │ Routing engine │ │
│ │ BIN-aware primary selection, │ │
│ │ decline-code-aware retry, │ │
│ │ failover with state preserve │ │
│ └────────────────────────────────┘ │
│ ┌────────────────────────────────┐ │
│ │ Acquirer adapters (internal) │ │
│ │ Acquirer A / B / C / D / ... │ │
│ └────────────────────────────────┘ │
└─────────────┬────────────────────────┘
│ (private network, in-DC RTT)
▼
┌──────────────────────────────────────┐
│ Selected acquirer → card scheme │
│ (Visa, Mastercard, Amex, ...) │
└──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ Operator settlement ledger │
│ One settlement record per cycle, │
│ regardless of routing path. │
└──────────────────────────────────────┘The merchant's SDK call lands on the operator's stack once. The vault, the routing engine, and the acquirer adapters all live inside that stack — same process or same VPC, talking over private network rather than over public TLS. The selected acquirer is the only external hop, and that hop is identical to what a direct-acquirer integration would have done anyway.
// VAULT
Tokenization that follows the route
The vault is the layer where the unified architecture pays off most concretely. In a pure-play setup, the vault sits on the orchestrator's side — but the acquirer needs a PAN (or a network token registered to that acquirer) to authorize the charge. Three things follow:
- The orchestrator has to either store raw PAN (PCI-scope risk) or store network tokens that are per-acquirer — which means one tokenized card maps to N tokens, one per downstream acquirer.
- Failover between acquirers requires the vault to either detokenize the network token back to PAN (latency + PCI exposure) or pre-register the card with multiple acquirers' network-token services (operational cost).
- When the underlying card is reissued, the vault has to propagate the update to every downstream acquirer's token registry. Most pure-play vaults don't — the merchant ends up with stale tokens at the secondary acquirers.
In the unified architecture, the vault and the acquirer relationships share the same operator. The same network token is registered against the operator's merchant of record, and the operator can present that token to any of its acquirer relationships without re-tokenization. Failover is a routing decision, not a vault round-trip. Card reissues update the token once, and every downstream acquirer sees the new card.
From the merchant's perspective, this means:
- One token per card, not N tokens per card. Storage cost, reconciliation surface, and cross-acquirer token migration complexity all collapse.
- No detokenization on failover. The token authenticates against any acquirer the router picks, including the failover target.
- Single network-tokenization upgrade path. When Visa or Mastercard ships a network-tokenization feature update, the operator implements it once and every downstream acquirer relationship picks it up.
// CASCADE
Cascading retries without re-tokenization
Cascading — the practice of retrying a soft-declined transaction against a different acquirer when the first says no — is one of the marquee features of any orchestration platform. In a pure-play setup, cascading is implemented as a sequence of external API calls: the orchestrator calls acquirer A, gets the decline, calls acquirer B, and so on. Each acquirer's tokenization requirements have to be satisfied separately.
In the unified architecture, cascading is a state machine inside one process:
attempt(card_token) → acquirer_A
decline (soft, code 05)
↓
classify(decline) → recoverable_soft
↓
route_next(card_token, acquirer_A) → acquirer_B
↓
attempt(card_token) → acquirer_B
approve
↓
return success to merchantThe token doesn't move. The card data doesn't re-cross any external boundary. The merchant sees one successful transaction with a routing trace explaining which acquirer eventually approved. The latency cost of the cascade is the network hop to acquirer A plus the network hop to acquirer B — exactly what it would have been in a direct-acquirer integration where the merchant implemented the cascade themselves.
In contrast, a pure-play orchestrator's cascade costs: hop from merchant to orchestrator → hop from orchestrator to acquirer A → decline → hop back to orchestrator → re-tokenize for acquirer B (sometimes) → hop to acquirer B → approve → hop back to orchestrator → hop back to merchant. Five network hops minimum, with external TLS on each.
// SETTLEMENT
Direct settlement bypasses the middleman ledger
Settlement is the back-office equivalent of the transport-layer middleman hop. In a pure-play setup, the acquirer settles funds to the merchant on its own schedule; the orchestrator gets reporting visibility but doesn't sit in the money flow. Some platforms front-settle to the merchant out of their own funds; most don't.
In either case, the merchant's reconciliation system sees N settlement records per cycle — one from each acquirer — plus an orchestration-side reporting view that attempts to unify them. The unification is usually best-effort: timestamps differ, batch boundaries differ, partial captures land at different times.
In the unified architecture, settlement is consolidated at the operator. The operator collects from the acquirers on the schedule the merchant negotiated (T+1, T+2, whatever), unifies the records on its own ledger, and settles to the merchant once per cycle with a single reconciliation file that already accounts for which acquirer processed which transaction. The merchant's finance system sees one settlement, one record, one reconciliation surface.
This isn't just convenience — it's a chargeback-program prerequisite. Card-network chargeback ratios are computed against the merchant of record. In a pure-play setup where each acquirer is the merchant of record for its own slice, the ratio computation gets sliced too: a merchant that's well under threshold in aggregate can be in violation at one specific acquirer because its share of the volume happened to skew toward the disputed transactions. The unified-operator model sees one merchant of record across all acquirers, and the ratio is computed once against the full volume.
// LATENCY BUDGET
Where the under-300ms target comes from
The unified architecture's latency budget for a single approval cycle:
| Hop | Budget |
|---|---|
| Merchant app → operator API (public TLS, 1 RTT) | 40–80ms |
| Vault lookup + routing decision (in-process) | 5–15ms |
| Operator → selected acquirer (private network) | 10–30ms |
| Acquirer → card scheme authorization | 80–180ms |
| Response unwind back to merchant | 40–80ms |
| Total target P95 | under 300ms |
The same transaction through a pure-play orchestrator adds two public-network hops (merchant → orchestrator, orchestrator → acquirer over TLS each time) plus the orchestrator's own routing-decision time. Realistic additional cost: 80–150ms. The pure-play stack's P95 lands around 400–500ms for the same authorization path, with more variance because two independent networks have to cooperate.
The 100–200ms difference is not academic. At checkout, every additional 100ms of perceived latency translates to a measurable cart-abandonment delta. On 3DS challenges, the issuer's own timeout starts ticking the moment the orchestrator initiates — extra latency on the merchant-to-orchestrator hop is latency the issuer's timeout is counting against you. In recurring-batch scenarios, the cumulative time-to-completion of a 50,000- charge batch differs by tens of minutes between the two architectures.
// THE CONSTRAINT
Why this architecture is only available inside an acquirer-plus-orchestration operator
The unified architecture is a structural property of the operator, not a configuration option. Three prerequisites have to hold simultaneously:
- The operator owns the acquirer relationships. PCI-DSS Level 1, scheme membership at the acquirer level (not just sponsored), KYC/AML infrastructure, capital reserve, regulatory posture. This is bank- relationship work, not API work.
- The operator runs the network-token vault. Visa Token Service + Mastercard MDES, registered to the operator's merchant of record. The vault can't be a wrapper around someone else's tokenization service or the per-acquirer token problem comes back.
- The orchestration engine, the vault, and the acquirer adapters live in the same trust domain. Same VPC, same data lineage, same incident-response process. If the routing engine has to call out to an external orchestration vendor for the actual routing logic, you're back to a public-network hop and the unified architecture's latency claim evaporates.
Building an open-source payment routing layer on top of this is feasible — and arguably the long-term equilibrium — but the bank-side participation is the constraint that makes the unified architecture rare. Most companies that could build the routing logic can't become an acquirer in any practical time frame. Most companies that are acquirers haven't historically invested in orchestration-quality routing engines.
For Von Payments, the unified architecture is what VORA sits on top of: the routing engine, the network-token vault, and the acquirer relationships across VON's 6+ tier-1 acquirer partners share the same operational roof. The orchestration cost is absorbed into our acquirer-spread margins rather than passed through to the merchant as a platform fee, and the latency budget stays in the under-300ms range because the routing hop is internal.
// FOR BUILDERS
What this means if you're integrating
From a building perspective, the unified-architecture operator looks the same as a pure-play orchestrator at first — same SDK shape, same payment-intent lifecycle, same webhook surface. The differences surface in three specific places:
- One set of credentials. A pure-play integration usually requires you to also hold and rotate credentials with each downstream acquirer. The unified operator handles that internally; the merchant has one API key, one secret, one webhook signing secret.
- Webhook semantics.Pure-play orchestrators sometimes forward acquirer webhooks verbatim, leaking the underlying acquirer's schema into the merchant's integration. Unified operators normalize the webhook surface and the merchant codes against one schema regardless of which acquirer processed the transaction.
- Failover is invisible. In a pure-play setup, failover from one acquirer to another sometimes produces secondary lifecycle events the merchant has to handle. In the unified architecture, the merchant sees one authorization with a routing trace; the failover is internal to the operator.
For technical evaluation, the test of whether you're looking at a unified or pure-play architecture is to ask for a sample webhook payload and a sample lifecycle trace of a transaction that failed over between two acquirers. The unified architecture's response is one event; the pure-play's is two or three.
For developer onboarding, the vault layer, and the routing engine specifics, follow the cross-links. If you want to compare architectures side-by-side against a pure-play setup you're evaluating right now, talk to underwriting— we'll run a latency-trace comparison on your actual transaction volume.