Why is my pgvector table so much larger than dimensions times 4 bytes?

Because the raw array is only one layer. The real footprint adds per-row heap overhead (tuple header, item pointer, alignment, TOAST pointer), the ANN index (1.1x-1.4x for IVFFlat or 1.8x-2.5x for HNSW copied vectors plus a neighbor graph), transient WAL during ingestion, and MVCC dead-tuple bloat from re-embedding. The index alone is often larger than the heap.

How much storage do 10 million 1024-dimension embeddings need?

About 75 GB provisioned for halfvec plus an HNSW index at m=16: roughly 20 GB raw payload, 0.3 GB tuple and page overhead, ~39 GB for the HNSW index at a 1.9x multiplier, and ~9 GB of WAL and TOAST transient buffer, plus 10 percent headroom. On float32 vector the raw payload doubles to ~41 GB and the total rises accordingly.

Does halfvec really halve total storage?

It halves the raw array, not the whole table. Fixed per-row overhead and the HNSW graph pointers do not shrink, so total savings are smaller than 50 percent. Validate recall on a holdout set first, since L2 distance amplifies float16 rounding more than cosine similarity does.

How do I verify a storage projection before provisioning?

Load a representative sample of 10k-50k real embeddings with the production data type and index built, then read heap, TOAST, and index sizes with pg_relation_size and pg_indexes_size and scale linearly to 10 million. Run VACUUM before measuring so sample bloat does not inflate the projection.

Calculating pgvector Storage Requirements for 10M Embeddings

Sizing disk and memory for 10 million vector embeddings from a naive dimensions × 4 bytes formula routinely underprovisions storage by 2x–4x and triggers an emergency volume resize mid-ingestion. This page gives a deterministic, five-step procedure that attributes every byte to a real layer — raw payload, heap and page overhead, the approximate nearest neighbor (ANN) index multiplier, WAL, and MVCC bloat — so you can provision a 10M-row vector table with sub-10% variance instead of guesswork.

Up: pgvector Storage Overhead Analysis

The trap is treating storage as a single number. On-disk footprint is the sum of independently scaling layers, and each responds to a different lever: precision choice shrinks the payload, fillfactor governs heap padding, the algorithm choice sets the index multiplier, and batch size drives WAL volume. Measure a small sample, attribute the bytes correctly, then extrapolate to 10M — the same layered model used across the pgvector Storage Overhead Analysis reference.

Prerequisites

pgvector 0.7+ for halfvec (float16) support, which halves the raw payload; check with SELECT extversion FROM pg_extension WHERE extname = 'vector';. Version 0.5+ works if you stay on the vector type.
PostgreSQL 15+ for pg_stat_progress_create_index and parallel index builds during the measurement pass.
A representative sample table — at least 10,000–50,000 real embeddings from your production model, not random floats. Sample size drives the accuracy of the extrapolation.
Your embedding dimensionality D and distance metric decided up front — precision tolerance depends on the metric, as covered in cosine vs L2 distance metrics.
A chosen data type before you measure; the vector vs halfvec decision is the single largest lever and is detailed in vector data type selection.

Step-by-step procedure

1. Establish the raw vector payload

Start from the array itself. For 10,000,000 embeddings at D dimensions, float32 (vector) stores 4 bytes per component and float16 (halfvec) stores 2:

SQL

-- Raw payload projection for both precisions at 10M rows
SELECT
  10000000 AS rows,
  1024      AS dims,
  pg_size_pretty(10000000::bigint * 1024 * 4) AS vector_float32,
  pg_size_pretty(10000000::bigint * 1024 * 2) AS halfvec_float16;
-- vector_float32 ≈ 41 GB, halfvec_float16 ≈ 20 GB

At 1024 dimensions the float32 payload is ~40.96 GB; casting to halfvec cuts it to ~20.48 GB. Confirm the real per-row size on your sample rather than trusting the arithmetic — pgvector wraps each array in a varlena header (4-byte length + 2-byte dim count + 2 flag bytes):

SQL

SELECT pg_column_size(embedding) AS bytes_per_vector
FROM your_sample_table LIMIT 1;

Because float4 bit patterns are high-entropy, vectors are effectively incompressible — default_toast_compression (whether pglz or LZ4) buys almost nothing, so budget the payload uncompressed.

2. Add heap tuple, page, and alignment overhead

PostgreSQL stores rows in 8 KB pages, and each row carries a 23-byte tuple header, a 4-byte item pointer, and up to 7 bytes of 8-byte alignment padding. At 10M rows this metadata alone adds roughly 270–320 MB. When the whole tuple crosses the TOAST_TUPLE_THRESHOLD (~2 KB) — which a 1024-D vector at 4 KB always does — the payload moves out-of-line into a TOAST relation, leaving an 18-byte pointer in the main heap. Reserve fillfactor headroom for Heap-Only Tuple (HOT) updates so re-embedding does not split pages:

SQL

ALTER TABLE embeddings SET (
  fillfactor = 90,
  autovacuum_vacuum_insert_threshold = 500,
  autovacuum_vacuum_insert_scale_factor = 0.05
);

Measure the true heap and TOAST size on your sample to fold real per-row overhead into the projection:

SQL

SELECT
  pg_size_pretty(pg_relation_size('your_sample_table'))       AS heap_main,
  pg_size_pretty(pg_total_relation_size('your_sample_table')
                 - pg_relation_size('your_sample_table'))     AS toast_and_indexes;

3. Apply the ANN index multiplier

The index is usually the largest object in a vector schema, and its multiplier depends entirely on the algorithm — the decision framed in HNSW vs IVFFlat algorithm selection:

IVFFlat stores centroids plus inverted lists of tuple pointers; footprint scales with lists and rows, landing at 1.1x–1.4x the raw payload. Size lists ≈ sqrt(rows) ≈ 3162 for 10M — the sizing worked in tuning IVFFlat lists for high-throughput similarity search.
HNSW stores a full copy of every vector inside the index plus a multi-layer neighbor graph of up to m edges per node, consuming 1.8x–2.5x the raw payload. Density is driven by m and ef_construction, calibrated in optimizing m and ef_construction parameters.

Build the index on the sample and read its real multiplier rather than assuming:

SQL

SELECT pg_size_pretty(pg_relation_size('idx_sample_embedding_hnsw')) AS index_size,
       round(pg_relation_size('idx_sample_embedding_hnsw')::numeric
             / pg_relation_size('your_sample_table'), 2)             AS multiplier_vs_heap;

4. Budget WAL and TOAST for ingestion

Loading 10M rows generates write-ahead log volume proportional to the serialized payload — unbatched inserts can produce 50–120 GB of WAL before checkpoints and archiving reclaim it. This is transient but must exist on disk during the load. Cut it by batching with COPY and widening the checkpoint interval:

PYTHON

# Batched ingestion keeps per-transaction WAL bounded (psycopg3)
import psycopg
BATCH = 10_000
with psycopg.connect(dsn) as conn, conn.cursor() as cur:
    with cur.copy(
        "COPY embeddings (id, embedding) FROM STDIN WITH (FORMAT BINARY)"
    ) as copy:
        for row_id, vec in stream_embeddings():   # yields ~10k-row chunks
            copy.write_row((row_id, vec))
    conn.commit()

SQL

-- Fewer checkpoints during the bulk load; revert after
ALTER SYSTEM SET max_wal_size = '16GB';
SELECT pg_reload_conf();

5. Add MVCC bloat headroom and sum the total

Every re-embedding UPDATE writes a new tuple version and marks the old one dead; because the update changes the indexed column, HOT rarely applies, so both heap and index bloat until VACUUM catches up. Reserve 15–30% headroom for write-heavy refresh cycles, then sum the layers. For 10M embeddings at 1024 dimensions on halfvec + HNSW:

Component	Calculation	Estimated size
Base payload	`10M × 1024 × 2 bytes`	20.48 GB
Tuple / page / alignment	`~32 bytes/row`	0.32 GB
HNSW index (`m = 16`)	`1.9× multiplier`	38.91 GB
WAL + TOAST transient buffer	`15% safety`	8.96 GB
Total provisioned	sum + 10% headroom	~75 GB

Parameter reference

Name	Type	Default	Production recommendation	Notes
`fillfactor`	int	`100`	`90`	Reserves in-page space for HOT updates; prevents page splits during re-embedding.
`default_toast_compression`	enum	`pglz`	`lz4` (marginal)	Vectors are near-incompressible; do not count on it to shrink the payload.
`lists` (IVFFlat)	int	`100`	`≈ sqrt(rows)` (~3162 at 10M)	Drives index size and recall; too few lists bloats scan cost, too many bloats the centroid table.
`m` (HNSW)	int	`16`	`16`–`32`	Edges per node; each step up raises the index multiplier toward 2.5x.
`ef_construction` (HNSW)	int	`64`	`128`–`256`	Build-time graph density; higher values grow both build time and index size.
`maintenance_work_mem`	memory	`64MB`	`2GB`–`16GB`	Must hold the build working set or the index spills to disk.
`max_wal_size`	memory	`1GB`	`8GB`–`32GB`	Bounds checkpoint frequency during bulk load; low values inflate transient WAL churn.
`autovacuum_vacuum_insert_scale_factor`	float	`0.2`	`0.05`	Triggers earlier vacuums on insert-heavy vector tables to cap dead-tuple bloat.

Verification

Load a known sample, measure every layer, and extrapolate linearly to 10M — the reliable way to confirm the projection before provisioning:

SQL

-- Full accounting for a loaded sample, then scale to 10M
WITH sizes AS (
  SELECT
    (SELECT count(*) FROM embeddings)                            AS n,
    pg_relation_size('embeddings')                              AS heap,
    pg_total_relation_size('embeddings')
      - pg_relation_size('embeddings')
      - pg_indexes_size('embeddings')                           AS toast,
    pg_indexes_size('embeddings')                               AS indexes
)
SELECT
  pg_size_pretty(heap)                                          AS heap_now,
  pg_size_pretty(indexes)                                       AS indexes_now,
  pg_size_pretty(((heap + toast + indexes)::numeric / n * 10000000)::bigint)
                                                                AS projected_10m
FROM sizes;

If projected_10m lands within 10% of the table above for the same precision and algorithm, the model holds. Cross-check bloat separately so a bloated sample does not inflate the projection:

SQL

SELECT relname, n_live_tup, n_dead_tup,
       round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables WHERE relname = 'embeddings';
-- run VACUUM before measuring if dead_pct is high

Troubleshooting

Projection is 2x low. You measured before building the ANN index, or measured on vector and plan to serve on HNSW. Re-run step 3 with the index actually built, then read multiplier_vs_heap — HNSW commonly adds more bytes than the entire heap.
projected_10m keeps climbing between runs. The sample table is bloating under repeated UPDATEs. Check n_dead_tup with the verification query, run VACUUM embeddings, and remeasure; a re-embedding job can double effective size faster than autovacuum reclaims it.
Disk fills during ingestion, then drops. Transient WAL, not table growth. Confirm with SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')); trends, raise max_wal_size, and switch to batched COPY (step 4) to bound per-transaction WAL.
halfvec saved less than half. Fixed per-row overhead (tuple header, page, TOAST pointer) does not shrink with precision, and the HNSW graph pointers are unchanged — only the raw array halves. Recompute with the layered model instead of applying 0.5x to the whole table.
Recall dropped after switching to halfvec. L2 distance amplifies float16 rounding more than cosine; validate the recall delta on a holdout set before committing, following cosine vs L2 distance metrics.

Vector data type selection — choose vector, halfvec, or sparsevec to set the base payload before you size anything else
Cosine vs L2 distance metrics — how metric choice governs the precision you can afford to drop
Resolving pgvector index build timeout errors — size maintenance_work_mem against the index footprint estimated here
Tuning IVFFlat lists for high-throughput similarity search — the lists value that drives the IVFFlat multiplier
Up: pgvector Storage Overhead Analysis

Calculating pgvector Storage Requirements for 10M Embeddings

Prerequisites #

Step-by-step procedure #

1. Establish the raw vector payload #

2. Add heap tuple, page, and alignment overhead #

3. Apply the ANN index multiplier #

4. Budget WAL and TOAST for ingestion #

5. Add MVCC bloat headroom and sum the total #

Parameter reference #

Verification #

Troubleshooting #

Related #