SCV Generation

Date: 2026-03-04 21:29:01

Purpose: Benchmark FSCS Single Customer View file generation at various account scales

Database

Source: external (BENCH_PG_DSN)
DSN: postgres://bench:***@localhost:15432/bench?sslmode=disable

SQL: SCV balance query

WITH credits AS (
    SELECT to_account_id AS account_id, SUM(amount) AS total
    FROM movements GROUP BY to_account_id
),
debits AS (
    SELECT from_account_id AS account_id, SUM(amount) AS total
    FROM movements GROUP BY from_account_id
)
SELECT a.id, a.account_id, a.product, a.currency,
    COALESCE(c.total, 0) - COALESCE(d.total, 0) AS balance
FROM accounts a
LEFT JOIN credits c ON c.account_id = a.id
LEFT JOIN debits d ON d.account_id = a.id
WHERE a.account_type = 'Liability'

Methods

Scenarios: 1K, 10K, 100K, 1M liability accounts
Schema: go-luca (accounts + movements), customers modelled via account_id field
Products: Random from IAA, ISA, NA, FD1
Balances: One seed movement per account (0–100K GBP random)
Iteration: Full SCV generation: query → group → write C file → write D file
C file: Pipe-delimited individual account records per FSCS spec
D file: Pipe-delimited per-customer aggregate, compensatable capped at £85,000
Footer: FSCS standard 99999999999999999999

Results: 1_000 accounts, 21 customers

Type	N	M	Mean	P50	P99	Min	Max	Iters
scv	1_000	21	3.47ms	3.32ms	6.34ms	1.71ms	8.06ms	50

Results: 10_000 accounts, 199 customers

Type	N	M	Mean	P50	P99	Min	Max	Iters
scv	10_000	199	30.82ms	27.50ms	48.25ms	19.11ms	68.78ms	20

Results: 100_000 accounts, 1_989 customers

Type	N	M	Mean	P50	P99	Min	Max	Iters
scv	100_000	1_989	350.58ms	359.45ms	391.76ms	291.43ms	402.16ms	5

Results: 1_000_000 accounts, 19_677 customers

Type	N	M	Mean	P50	P99	Min	Max	Iters
scv	1_000_000	19_677	2.960s	2.960s	2.960s	2.960s	2.960s	1

Purpose

FSCS requires firms to produce Single Customer View (SCV) files regularly. This benchmark measures how fast go-luca can generate the two key CSV files from a PostgreSQL database:

C file: Individual account records (one row per account, pipe-delimited)
D file: Aggregate customer balances with compensatable amounts capped at £85,000

The benchmark scales from 1K to 1M accounts. Customers have 1–100 accounts each with random balances 0–100K GBP. Products are randomly assigned from IAA, ISA, NA, and FD1.

Each iteration performs a full SCV generation: query all liability accounts with computed balances, group by customer, write pipe-delimited C and D files with the FSCS footer.

Analysis

Query Optimisation: OR Join vs CTE

The initial implementation used a single OR join to compute balances:

SELECT a.id, a.account_id, a.product, a.currency,
    COALESCE(SUM(CASE WHEN m.to_account_id = a.id THEN m.amount ELSE 0 END), 0)
  - COALESCE(SUM(CASE WHEN m.from_account_id = a.id THEN m.amount ELSE 0 END), 0) AS balance
FROM accounts a
LEFT JOIN movements m ON m.to_account_id = a.id OR m.from_account_id = a.id
WHERE a.account_type = 'Liability'
GROUP BY a.id, a.account_id, a.product, a.currency

This was replaced with a CTE-based approach that pre-aggregates credits and debits separately, then joins the summaries:

WITH credits AS (
    SELECT to_account_id AS account_id, SUM(amount) AS total
    FROM movements GROUP BY to_account_id
),
debits AS (
    SELECT from_account_id AS account_id, SUM(amount) AS total
    FROM movements GROUP BY from_account_id
)
SELECT a.id, a.account_id, a.product, a.currency,
    COALESCE(c.total, 0) - COALESCE(d.total, 0) AS balance
FROM accounts a
LEFT JOIN credits c ON c.account_id = a.id
LEFT JOIN debits d ON d.account_id = a.id
WHERE a.account_type = 'Liability'

Results

Accounts	OR Join (mean)	CTE (mean)	Speedup
1K	56.5ms	3.2ms	~18x
10K	4,598ms	33ms	~139x
100K	did not complete	338ms	—
1M	did not complete	3.5s	—

Why

The OR join (ON m.to_account_id = a.id OR m.from_account_id = a.id) prevents PostgreSQL from using a single index scan. The planner falls back to nested loops or bitmap OR scans, giving roughly O(n * m) behaviour where n = accounts and m = movements.

The CTE approach scans the movements table exactly twice (once per CTE), aggregates with hash aggregation using the existing indexes, then hash-joins the two small summary tables to accounts. This scales linearly — doubling the data roughly doubles the time.

Takeaway

For SCV generation at scale, always pre-aggregate movements into credit/debit summaries before joining to accounts. The OR join pattern is only viable for small datasets (< 1K accounts).