Skip to content
Go back

Building a Payment System That Proves Finality Instead of Asserting It

Published:
• 11 min read Edit on GitHub

There is a phrase in payments infrastructure that comes up a lot in design discussions: “asserting finality.” It is what most payment systems do. At some point in the processing lifecycle, the system decides that a transaction succeeded or failed, records that decision, and moves on. The decision is authoritative because the system says it is.

I have been building something that works differently. Instead of asserting finality, it proves it.

This distinction might sound philosophical. In practice it drives almost every architectural decision in the system.

The problem with asserted finality

Traditional payment operators handle finality internally. A transaction arrives, gets processed, the system records “SUCCESS,” and that record becomes the truth. If something goes wrong downstream, if a reconciliation mismatch surfaces two weeks later, if a regulator asks for the audit trail, the system has to reconstruct the basis for that original decision from logs and database records that were never designed for that purpose.

This creates several problems that compound over time.

First, the audit trail is reconstructed rather than original. You are telling the regulator a story about what happened, assembled after the fact from data that was not collected with that story in mind.

Second, reconciliation becomes a forensic exercise. When the system’s ledger does not match a partner bank’s statement, you need to figure out which one is wrong. If neither has a cryptographic fingerprint of the facts at the time of the transaction, you are comparing two assertion-based records against each other. That is not a reliable reconciliation.

Third, disputes are expensive. A merchant who disagrees with a chargeback outcome cannot independently verify the evidence that drove the decision. The platform says it happened this way. That is the end of the conversation.

I wanted to build a system where none of these problems exist.

The core idea: every outcome is evidenced, not asserted

The architecture I have been building works around a separation that most payment systems do not make explicit: the separation between execution and finality.

Execution happens in one layer. A payment command goes in, gets processed, a cryptographic receipt comes out. The receipt captures the material facts of the execution as a canonical JSON document and hashes it with SHA-256. The hash is committed. This is the execution record. It does not claim any particular outcome. It just records what happened, in a form that cannot be altered without detection.

# Execution receipt: what was done, hashed for integrity
receipt = {
    "receipt_id": "rcpt_a3f9b2c1",
    "subject_id": "pay_0191xyz",
    "correlation_id": "merchant_ref_8821",
    "command_type": "PAYMENT_EXECUTE",
    "amount_minor": 150000,
    "currency": "NGN",
    "executed_at": "2025-09-08T14:22:31.004Z",
    "rail_facts": { ... }  # execution-layer specifics
}

receipt_hash = sha256(canonical_json(receipt))
# This hash is committed to the evidence store.
# The receipt cannot be altered without the hash changing.

Finality is decided in a separate layer, by a separate system that I call the truth kernel. The execution layer submits evidence to the truth kernel. The truth kernel evaluates that evidence against a versioned policy, records its decision with a snapshot of the policy that drove it, and issues a ruling. The ruling is what the merchant sees. The underlying evidence is what makes the ruling auditable.

Execution vs. finality: how the two layers interact
EXECUTION LAYER
What it does
Processes payment commands. Maintains a double-entry ledger. Produces cryptographic receipts. Dispatches evidence to the truth kernel via transactional outbox.
Does NOT decide whether a payment succeeded.
TRUTH KERNEL
What it does
Receives evidence. Evaluates against versioned policy. Records decision with policy snapshot. Issues finality rulings: NON_FINAL, FINAL_SUCCESS, or FINAL_FAILURE.
Does NOT execute payments. Does NOT hold funds.

Key property: Every finality decision has a policy_hash — the cryptographic fingerprint of the policy version that governed it. The decision can be deterministically replayed years later against the same policy snapshot.

The thing I find most important about this design is what it does for the audit trail. Because every finality decision is linked to the evidence that drove it, and the evidence has a cryptographic fingerprint that matches the original execution record, the audit trail is original, not reconstructed. If a regulator asks to review the basis for a particular payment outcome five years from now, the system can replay the decision deterministically from the committed evidence. There is nothing to reconstruct.

The reconciliation engine has no side effects

One of the problems I struggled with early in the design was reconciliation. Reconciliation is inherently a comparison operation: you compare your records against a partner’s statement and find the differences. But in most systems, reconciliation is also stateful. It updates records, creates adjustments, triggers notifications. The output depends on the state of the system at the time it runs.

This makes reconciliation non-deterministic. Run it twice on slightly different days and you may get different results, because the system state changed between runs.

I built the reconciliation engine as a pure function. Same inputs, same outputs, always.

# Deterministic reconciliation: pure function, no side effects
# Inputs: partner statement records + receipt lookup from execution layer
# Output: matches + mismatches + cryptographic fingerprints of both

def reconcile(statement_records, receipt_lookup):
    matches = []
    mismatches = []

    for record in statement_records:
        receipt = receipt_lookup.find(
            by_receipt_id=record.receipt_id,
            by_correlation_id=record.correlation_id
        )
        if receipt:
            matches.append(Match(record, receipt))
        else:
            mismatches.append(Mismatch(record, reason="missing_execution_match"))

    # Both inputs and outputs get cryptographic fingerprints
    inputs_hash = sha256(canonical_json(statement_records))
    outputs_hash = sha256(canonical_json({"matches": matches, "mismatches": mismatches}))

    return ReconciliationResult(matches, mismatches, inputs_hash, outputs_hash)
    # Run this again on the same inputs tomorrow: same result, same hashes.

The inputs hash and outputs hash are committed alongside the reconciliation results. If anyone questions whether the reconciliation was performed correctly, or whether the results have been tampered with since, the hashes provide the proof. This is the same principle as the receipt hashing in the execution layer, applied to the reconciliation workflow.

Partner onboarding as a conformance problem

The system is not self-contained. Real payments go through licensed partners: banks, payment service providers, microfinance institutions. The question of how to onboard these partners is not purely a technical problem. It is a trust problem. Before a partner can process real money, I need to know that their implementation handles the failure modes correctly.

The approach I landed on is conformance certification. Before a partner is enabled for production, they must pass a battery of scenarios that test specific behaviors:

Duplicate delivery handling. If the same transaction is delivered twice, the system should accept exactly one and drop the duplicate. Not process it twice.

Out-of-order sequence handling. Transaction events from a partner may arrive out of sequence. The system needs to handle that gracefully, either by quarantining until the sequence resolves or by reordering.

Retry budget enforcement. If a transaction times out, the system should retry up to a defined limit. It should not retry indefinitely. It should not exceed the retry budget.

Statement reconciliation. The partner’s settlement statement should match the execution layer’s records. Known mismatches should be predictable and bounded.

Partner conformance certification: gate structure before production enablement
Duplicate delivery: exactly one accepted per unique delivery keyREQUIRED
Out-of-order sequence: QUARANTINE or reorder strategy applied consistentlyREQUIRED
Retry budget: attempts do not exceed configured max, no infinite retry loopsREQUIRED
Statement reconciliation: mismatch count matches declared expected countREQUIRED
Governance approval: proposer + approver sign-off before REAL mode enabledGATE

Each scenario run produces cryptographic evidence — inputs hash and outputs hash — committed alongside the results. Conformance is not just a checkbox. It is a verifiable artifact.

Each scenario run produces cryptographic evidence in the same form as the execution layer receipts. The conformance report is a verifiable artifact. If a partner’s behavior regresses after certification, the regression is detectable because the hashes from the new run will not match the hashes from the certification run.

The webhook problem and timing attacks

One thing that surprised me early in the design was how much care the webhook delivery system required. Webhooks are how merchants get notified of payment outcomes. The naive implementation — sign the payload, send it, log whether it was received — turns out to have several subtle failure modes.

The signature algorithm matters more than it seems. HMAC-SHA256 is the right primitive. But the verification implementation needs to use a constant-time comparison. If you use a standard equality check, the response time of the verification leaks information about how many bytes of the signature match. An attacker can use that timing side channel to forge signatures byte by byte. This is not theoretical. The fix is one function call — timingSafeEqual in most standard libraries — but you have to know to use it.

# Correct webhook signature verification
import hmac, hashlib

def verify_signature(secret: str, payload: bytes, header: str) -> bool:
    expected_mac = hmac.new(
        secret.encode("utf-8"),
        payload,
        hashlib.sha256
    ).hexdigest()
    expected = f"v1={expected_mac}"

    # timingSafeEqual prevents timing attacks.
    # A regular == comparison leaks information about where the strings diverge.
    return hmac.compare_digest(expected, header)

I have canonical test vectors for the signature algorithm committed to the repository. Any SDK implementation for any language must produce the same output for the same inputs. The CI pipeline cross-checks the vectors across every implementation to catch drift before it reaches production.

What I have not built yet

This system is in active development and there is a lot of road ahead.

The execution mode gating — local development, simulated (UAT), production — is in place and working. The first partner has completed conformance certification and is ready for production smoke testing. The governance layer, the audit trail, and the evidence system are architecturally complete.

What I am still building is the full merchant product layer: the dashboard, the dispute workflow, the payout management, the webhook administration. These are the surfaces that most users will interact with. Getting them right requires the same attention to the execution mode boundary and the same commitment to truthful states at every level.

The hardest design problem left is the incident management workflow. When something goes wrong in production, the system needs to surface it, contain it, drive it to resolution, and produce an evidence artifact that proves the incident was handled correctly. The architecture for this exists. The implementation is in progress.

I will keep writing about this as it develops. The architecture is still the most interesting part of the problem, and there is more to say about the policy versioning system, the Merkle-style commitment structure, and the approach to execution mode gating that I have not gotten into here.


If you are working on payments infrastructure, finality semantics, or auditability-first system design, I would be glad to compare notes.

Share this post on:

New posts, research updates, and nerdy links straight to your inbox.

2× per month, pure signal, zero fluff.

Go back