Scv Benchmark
Purpose
FSCS requires firms to produce Single Customer View (SCV) files regularly. This benchmark measures how fast go-luca can generate the two key CSV files from a PostgreSQL database:
- C file: Individual account records (one row per account, pipe-delimited)
- D file: Aggregate customer balances with compensatable amounts capped at £85,000
The benchmark scales from 1K to 1M accounts. Customers have 1–100 accounts each with random balances 0–100K GBP. Products are randomly assigned from IAA, ISA, NA, and FD1.
Each iteration performs a full SCV generation: query all liability accounts with computed balances, group by customer, write pipe-delimited C and D files with the FSCS footer.
Analysis
Query Optimisation: OR Join vs CTE
The initial implementation used a single OR join to compute balances:
SELECT a.id, a.account_id, a.product, a.currency,
COALESCE(SUM(CASE WHEN m.to_account_id = a.id THEN m.amount ELSE 0 END), 0)
- COALESCE(SUM(CASE WHEN m.from_account_id = a.id THEN m.amount ELSE 0 END), 0) AS balance
FROM accounts a
LEFT JOIN movements m ON m.to_account_id = a.id OR m.from_account_id = a.id
WHERE a.account_type = 'Liability'
GROUP BY a.id, a.account_id, a.product, a.currency
This was replaced with a CTE-based approach that pre-aggregates credits and debits separately, then joins the summaries:
WITH credits AS (
SELECT to_account_id AS account_id, SUM(amount) AS total
FROM movements GROUP BY to_account_id
),
debits AS (
SELECT from_account_id AS account_id, SUM(amount) AS total
FROM movements GROUP BY from_account_id
)
SELECT a.id, a.account_id, a.product, a.currency,
COALESCE(c.total, 0) - COALESCE(d.total, 0) AS balance
FROM accounts a
LEFT JOIN credits c ON c.account_id = a.id
LEFT JOIN debits d ON d.account_id = a.id
WHERE a.account_type = 'Liability'
Results
| Accounts | OR Join (mean) | CTE (mean) | Speedup |
|---|---|---|---|
| 1K | 56.5ms | 3.2ms | ~18x |
| 10K | 4,598ms | 33ms | ~139x |
| 100K | did not complete | 338ms | — |
| 1M | did not complete | 3.5s | — |
Why
The OR join (ON m.to_account_id = a.id OR m.from_account_id = a.id) prevents
PostgreSQL from using a single index scan. The planner falls back to nested loops
or bitmap OR scans, giving roughly O(n * m) behaviour where n = accounts and
m = movements.
The CTE approach scans the movements table exactly twice (once per CTE), aggregates with hash aggregation using the existing indexes, then hash-joins the two small summary tables to accounts. This scales linearly — doubling the data roughly doubles the time.
Takeaway
For SCV generation at scale, always pre-aggregate movements into credit/debit summaries before joining to accounts. The OR join pattern is only viable for small datasets (< 1K accounts).