pgvector Storage Overhead Analysis: Heap, Index, and Bloat Diagnostics

Q: Why is my pgvector table 3x larger than dimensions × 4 × rows?

The naive formula counts only the raw float4 array. It ignores per-tuple header and alignment padding, out-of-line TOAST storage for vectors over ~2 KB, and the ANN index — HNSW stores a full second copy of every vector plus its neighbor graph at 1.5x–2.5x the raw volume. Add MVCC dead tuples from re-embedding and 3x is normal. Attribute bytes with pg_total_relation_size, pg_indexes_size, and pg_stat_user_tables before optimizing.

Q: Does halfvec actually reduce disk usage or just memory?

Both. halfvec stores each element in 2 bytes instead of 4 on disk, in the heap, in TOAST, and inside the ANN index, so a 1536-dimension column drops from ~6 KB to ~3 KB of raw payload per row and the HNSW index shrinks proportionally. Rebuild the index against the new type and run VACUUM FULL or pg_repack to release the space the conversion UPDATE left as dead tuples.

Q: Will VACUUM FULL shrink my vector table on disk?

Yes. VACUUM FULL rewrites the table and its indexes into fresh compact files and returns freed space to the operating system, unlike plain VACUUM which only marks space reusable. The cost is an ACCESS EXCLUSIVE lock for the duration. For a live table use pg_repack, which performs the same compaction online with only a brief final lock.

Q: Does TOAST compression help with vector storage?

Almost never for the vector payload itself. float4 and float16 bit patterns are high-entropy, so pglz or lz4 compress them by only a few percent while adding CPU on every read. TOAST compression still helps adjacent text or JSON columns. The real storage lever for vectors is the element type (halfvec, sparsevec) and pruning unused indexes, not compression settings.

Q: How do I tell whether the index or the heap is driving my overhead?

Run pg_relation_size for the heap main fork, pg_indexes_size for all indexes combined, and subtract both from pg_total_relation_size to isolate TOAST. Break indexes out individually via pg_stat_user_indexes. If the index total exceeds the heap you are index-bound and should revisit m/lists or drop unused indexes; if TOAST dominates the lever is the column type; if dead tuples are high it is a vacuum problem.

Storage overhead in pgvector is rarely the linear dimensions × 4 bytes calculation teams assume when they size their first cluster. At production scale the gap between raw embedding size and real disk consumption is driven by PostgreSQL heap alignment, TOAST relocation thresholds, ANN index topology, and MVCC dead-tuple bloat — and if you plan capacity from the naive formula you will underprovision storage by 2x–4x, trip an unexpected cloud storage tier migration, and watch query latency climb as bloated indexes spill out of shared_buffers. This page dissects where the bytes actually go, gives you the diagnostic SQL to measure each layer, and shows how to keep vector infrastructure lean across the write-heavy refresh cycles typical of retrieval pipelines.

Up: pgvector Architecture & Vector Fundamentals

Architectural Divergence & Trade-offs

Total on-disk footprint for a vector table is the sum of three independently growing layers, and diagnosing overhead means attributing bytes to the right one before you reach for a fix. The three layers behave differently enough that a remedy for one does nothing for the others.

Layer 1 — Heap tuple storage (alignment + TOAST). PostgreSQL stores a vector column as a varlena (variable-length) structure: a 4-byte length header, a 2-byte dimension count, 2 bytes of flags/reserved, then the raw float4 array. Heap tuples are padded to 8-byte boundaries, so 0–7 bytes of alignment padding appear per row depending on the preceding column types. Once the whole tuple crosses the TOAST_TUPLE_THRESHOLD (~2 KB), PostgreSQL moves the vector payload out-of-line into a TOAST relation, leaving an 18-byte pointer in the main heap plus per-chunk TOAST headers. A 1536-dimension vector is 6 KB of raw floats, so it always TOASTs; a 384-dimension vector at 1.5 KB usually stays inline. Critically, pgvector data is effectively incompressible (float4 bit patterns are high-entropy), so default_toast_compression buys you almost nothing — the payload is stored external but uncompressed. This is the layer the pgvector Architecture & Vector Fundamentals reference covers in depth, and it is also where vector data type selection has the largest leverage: switching vector to halfvec halves the raw array before any of these thresholds apply.

Layer 2 — ANN index structure. The index is a separate relation with its own scaling law. An ivfflat index stores a centroid list plus inverted lists of tuple pointers; its size scales roughly as (lists × dimensions × 4) + (rows × entry_size) and typically lands at 0.8x–1.2x the raw vector volume. An hnsw index stores a full copy of every vector inside the index plus a multi-layer neighbor graph, where each node keeps up to m forward edges per layer; HNSW indexes routinely consume 1.5x–2.5x the raw vector volume. That means the index, not the heap, is often the largest object in a vector schema. Choosing between them is the subject of the HNSW vs IVFFlat algorithm selection framework, and the multiplier is tuned via optimizing m and ef_construction parameters.

Layer 3 — MVCC dead tuples (bloat). Every UPDATE to a vector row writes a new tuple version and marks the old one dead; the old version occupies space until VACUUM reclaims it. Because vector tuples are large and updates change the indexed column, HOT (Heap-Only Tuple) updates rarely apply — each update also inserts a new index entry, bloating both heap and index. Under a re-embedding job that rewrites millions of rows, dead tuples can double the effective size of a table faster than autovacuum keeps up.

Overhead layer	Scales with	Typical multiplier vs raw	Primary lever
Heap + TOAST	rows × (dims × element_size + ~28 B)	1.0x–1.15x	Type choice (`halfvec`, `sparsevec`)
HNSW index	rows × (dims × element_size + m×edge)	1.5x–2.5x	`m`, drop/rebuild strategy
IVFFlat index	rows × pointer + lists × dims × 4	0.8x–1.2x	`lists`
MVCC dead tuples	update/delete rate ÷ vacuum rate	0x–2x+ (unbounded)	`autovacuum` tuning

Parameter Space & Diagnostic Workflow

You cannot manage what you have not attributed. The workflow below measures each layer separately so you know which lever to pull. Start with the combined footprint, then split it.

SQL

-- Combined size: heap + TOAST + all indexes, human-readable.
SELECT
  pg_size_pretty(pg_total_relation_size('documents'))              AS total,
  pg_size_pretty(pg_relation_size('documents'))                    AS heap_main,
  pg_size_pretty(pg_total_relation_size('documents')
               - pg_relation_size('documents')
               - COALESCE(pg_indexes_size('documents'), 0))        AS toast,
  pg_size_pretty(pg_indexes_size('documents'))                     AS indexes;

Break the index number down per-index so you can see the HNSW multiplier directly:

SQL

SELECT
  indexrelname               AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan
FROM pg_stat_user_indexes
WHERE relname = 'documents'
ORDER BY pg_relation_size(indexrelid) DESC;

Bloat attribution needs the dead-tuple ratio. pg_stat_user_tables gives a fast estimate; pgstattuple gives an exact (but scan-heavy) measurement:

SQL

-- Fast estimate from the stats collector.
SELECT relname,
       n_live_tup,
       n_dead_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup, 0), 3) AS dead_ratio,
       last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'documents';

-- Exact measurement (requires: CREATE EXTENSION pgstattuple;).
SELECT approx_free_percent, dead_tuple_percent, dead_tuple_len
FROM pgstattuple_approx('documents');

The knobs that govern each layer, with production-oriented targets rather than shipped defaults:

Parameter	Layer	Default	Production recommendation	Notes
column type	Heap/TOAST	`vector` (4 B/dim)	`halfvec` (2 B/dim) for ≥768-dim transformer output	Halves heap + index; recall loss usually <0.5%
`default_toast_compression`	TOAST	`pglz`	`lz4` (or leave — floats barely compress)	Low value for dense vectors; matters for adjacent text columns
`m` (HNSW)	Index	16	16 for ≤1M rows; 24–32 only if recall demands	Each +8 adds meaningful index bytes per row
`lists` (IVFFlat)	Index	100	`rows / 1000` (up to `sqrt(rows)`)	See tuning IVFFlat lists
`autovacuum_vacuum_scale_factor`	Bloat	0.2	0.02–0.05 per-table for churny vector tables	0.2 lets a 10M-row table grow 2M dead tuples first
`autovacuum_vacuum_cost_limit`	Bloat	200	1000–2000	Lets vacuum keep pace with re-embedding writes
`fillfactor`	Heap	100	90–95 for update-heavy tables	Leaves page room, marginally helps HOT

Step-by-Step Implementation

The following sequence takes a table from a naive vector layout to a measured, lean footprint. Each step is runnable against PostgreSQL 15+ with pgvector 0.7+.

Step 1 — Establish the raw baseline. Compute the theoretical floor so you can compare it to measured reality and quantify the overhead multiple.

SQL

-- Theoretical raw bytes: element_size × dims × rows.
SELECT
  count(*)                          AS rows,
  1536                              AS dims,
  pg_size_pretty(count(*) * 1536 * 4) AS raw_float4,
  pg_size_pretty(pg_total_relation_size('documents')) AS measured_total
FROM documents;

Step 2 — Convert dense columns to halfvec. For 768–3072 dimension transformer embeddings, halfvec (IEEE-754 half precision) cuts per-element storage from 4 to 2 bytes across both heap and index. Do the conversion with an explicit cast; normalize upstream first so the cast is lossless in direction (see normalizing embeddings before pgvector insertion).

SQL

-- Add a halfvec column, backfill, then swap. Do this in a maintenance window.
ALTER TABLE documents ADD COLUMN embedding_h halfvec(1536);
UPDATE documents SET embedding_h = embedding::halfvec(1536);
ALTER TABLE documents DROP COLUMN embedding;
ALTER TABLE documents RENAME COLUMN embedding_h TO embedding;

Step 3 — Rebuild the index on the smaller type. The index must be recreated against the new column. Use CONCURRENTLY to avoid an exclusive lock on a live table.

SQL

CREATE INDEX CONCURRENTLY documents_embedding_hnsw
  ON documents USING hnsw (embedding halfvec_cosine_ops)
  WITH (m = 16, ef_construction = 64);

Step 4 — Reclaim bloat from the rewrite. The UPDATE in Step 2 doubled the heap with dead tuples. VACUUM FULL rewrites the table compactly (it takes an ACCESS EXCLUSIVE lock); pg_repack does the same online if you cannot take the lock.

SQL

VACUUM (FULL, ANALYZE) documents;

Step 5 — Lock in per-table autovacuum. Prevent the bloat from returning under ongoing refreshes.

SQL

ALTER TABLE documents SET (
  autovacuum_vacuum_scale_factor = 0.02,
  autovacuum_vacuum_cost_limit   = 1000
);

For batch backfills, ingest with COPY rather than row-by-row INSERT to minimize WAL volume and page fragmentation. A Python loader that streams half-precision vectors through COPY:

PYTHON

import numpy as np
import psycopg
from pgvector.psycopg import register_vector

def copy_halfvec(conn, rows):
    # rows: iterable of (id, np.ndarray float32, already L2-normalized)
    register_vector(conn)
    with conn.cursor() as cur:
        with cur.copy("COPY documents (id, embedding) FROM STDIN WITH (FORMAT BINARY)") as cp:
            for doc_id, vec in rows:
                # store as halfvec; direction preserved, 2 bytes/dim on disk
                cp.write_row((doc_id, vec.astype(np.float16)))
    conn.commit()

Validation & Recall Testing

Shrinking storage is only a win if recall holds. Validate both the footprint reduction and that the smaller type still returns the right neighbors.

Confirm the index is actually used (a bloated or mistyped index silently falls back to a sequential scan, which reads the whole heap and defeats the point):

SQL

EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM documents
ORDER BY embedding <=> $1::halfvec(1536)
LIMIT 10;
-- Expect: "Index Scan using documents_embedding_hnsw".
-- A "Seq Scan" here means the planner ignored the index — check ef_search/type match.

Measure recall against an exact ground-truth from a brute-force scan. Run the exact query with the ANN index disabled, then compare the ANN result set:

PYTHON

import psycopg

def recall_at_k(conn, query_vec, k=10, ef_search=40):
    with conn.cursor() as cur:
        # Ground truth: exact search (force seq scan for a true baseline).
        cur.execute("SET LOCAL enable_indexscan = off; SET LOCAL enable_bitmapscan = off;")
        cur.execute(
            "SELECT id FROM documents ORDER BY embedding <=> %s::halfvec LIMIT %s",
            (query_vec, k),
        )
        exact = {r[0] for r in cur.fetchall()}

        # ANN result at the chosen ef_search.
        cur.execute("SET LOCAL enable_indexscan = on; SET LOCAL enable_bitmapscan = on;")
        cur.execute("SET LOCAL hnsw.ef_search = %s", (ef_search,))
        cur.execute(
            "SELECT id FROM documents ORDER BY embedding <=> %s::halfvec LIMIT %s",
            (query_vec, k),
        )
        approx = {r[0] for r in cur.fetchall()}

    return len(exact & approx) / k

Run recall_at_k across a held-out sample of queries before and after the halfvec migration; a drop of more than ~1 point at fixed ef_search means the precision loss is material for your data and you should either raise ef_search or keep full vector on that column.

Failure Modes & Gotchas

Sequential-scan fallback after a type change. If the index operator class does not match the column type (e.g. an hnsw ... vector_cosine_ops index left behind after converting the column to halfvec), the planner silently reverts to a full heap scan. Latency degrades quietly under load rather than erroring. Catch it with the EXPLAIN check above and confirm the operator class with \d+ documents.
TOAST fragmentation masquerading as heap bloat. pg_relation_size('documents') reports only the main fork; a table that looks small there can hide gigabytes in its TOAST relation. Always diagnose with pg_total_relation_size and inspect the TOAST fork via pg_relation_size(reltoastrelid) from pg_class.
VACUUM reclaims but never shrinks the file. Plain VACUUM marks dead space reusable but does not return it to the OS, so df shows no change. Only VACUUM FULL, pg_repack, or a REINDEX actually shrinks the on-disk file — plan for the lock or the online-repack tooling.
WAL amplification during backfill. A bulk UPDATE ... SET embedding = embedding::halfvec rewrites every row and every index entry, generating WAL proportional to the whole table and stalling replicas that cannot keep up. Batch by primary-key ranges, or rebuild into a new table and swap, rather than one monolithic statement.
Index build memory spill. HNSW builds that exceed maintenance_work_mem fall back to a slow, disk-based build and can leave a temporarily oversized index. Size maintenance_work_mem to hold the graph, and see asynchronous index build strategies for building without blocking writes.
Multi-tenant tables where one tenant dominates storage. Per-tenant row skew makes global capacity math wrong. If you partition or isolate by tenant, account for it in security boundaries for vector data and size per partition.

Monitoring & Alerting Hooks

Storage overhead is a slow-moving failure, so trend it rather than spot-check it. Export these as gauges to Prometheus/Grafana and alert on the ratios, not the absolutes.

SQL

-- Per-table storage + bloat gauges, one row per vector table.
SELECT
  relname                                             AS table,
  pg_total_relation_size(relid)                       AS total_bytes,
  pg_indexes_size(relid)                              AS index_bytes,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup::numeric / NULLIF(n_live_tup,0), 3) AS dead_ratio,
  extract(epoch FROM now() - last_autovacuum)         AS secs_since_autovacuum
FROM pg_stat_user_tables
JOIN pg_class ON pg_class.oid = relid
WHERE relname LIKE '%embedding%' OR relname = 'documents';

SQL

-- Index-to-heap ratio: rising ratio signals index bloat before latency moves.
SELECT
  relname AS table,
  round(pg_indexes_size(relid)::numeric / NULLIF(pg_relation_size(relid),0), 2) AS index_heap_ratio
FROM pg_stat_user_tables
WHERE relname = 'documents';

Recommended alert thresholds: page on dead_ratio > 0.15 (autovacuum is falling behind), warn on index_heap_ratio > 2.5 (an HNSW index has bloated past its expected multiplier and wants a REINDEX CONCURRENTLY), and warn on secs_since_autovacuum exceeding your refresh interval by 3x. To correlate storage growth with write amplification, track these alongside pg_stat_bgwriter checkpoint counts and the pgstattuple dead-tuple length on a nightly cron. For a full worked projection across a large corpus, follow calculating pgvector storage requirements for 10M embeddings.

FAQ

Why is my pgvector table 3x larger than dimensions × 4 × rows?

The naive formula counts only the raw float4 array. It ignores the ~28 bytes of per-tuple header and alignment padding, the out-of-line TOAST storage for vectors over ~2 KB, and — usually the biggest contributor — the ANN index, which for HNSW stores a full second copy of every vector plus its neighbor graph at 1.5x–2.5x the raw volume. Add MVCC dead tuples from any recent re-embedding and 3x is entirely normal. Attribute the bytes with pg_total_relation_size, pg_indexes_size, and pg_stat_user_tables before optimizing.

Does halfvec actually reduce disk usage or just memory?

Both. halfvec stores each element in 2 bytes instead of 4 on disk, in the heap, in TOAST, and inside the ANN index — so a 1536-dimension column drops from ~6 KB to ~3 KB of raw payload per row, and the HNSW index that copies those vectors shrinks proportionally. You must rebuild the index against the new type and run VACUUM FULL (or pg_repack) to release the space the conversion UPDATE left as dead tuples.

Will VACUUM FULL shrink my vector table on disk?

Yes — VACUUM FULL rewrites the table and its indexes into fresh, compact files and returns freed space to the operating system, unlike plain VACUUM, which only marks space reusable. The cost is an ACCESS EXCLUSIVE lock for the duration, which blocks all reads and writes. For a live table use pg_repack, which performs the same compaction online with only a brief lock at the end.

Does TOAST compression help with vector storage?

Almost never for the vector payload itself. float4/float16 bit patterns are high-entropy, so pglz or lz4 typically compress them by only a few percent while adding CPU on every read. TOAST compression still matters for adjacent text or JSON columns on the same row. The real storage lever for vectors is the element type (halfvec, sparsevec) and pruning unused indexes — not compression settings.

How do I tell whether the index or the heap is driving my overhead?

Run pg_relation_size('documents') for the heap main fork, pg_indexes_size('documents') for all indexes combined, and subtract both from pg_total_relation_size('documents') to isolate TOAST. Then break indexes out individually via pg_stat_user_indexes. If the index total exceeds the heap, you are index-bound and should revisit m/lists or drop unused indexes; if TOAST dominates, the lever is the column type; if dead tuples are high, it is a vacuum problem.

Vector Data Type Selection — choosing vector, halfvec, or sparsevec and the storage boundaries of each.
Cosine vs L2 Distance Metrics — how metric choice interacts with index layout and normalization.
Security Boundaries for Vector Data — per-tenant isolation and its effect on per-partition sizing.
HNSW vs IVFFlat Algorithm Selection — the index topology that dominates the storage multiplier.
Calculating pgvector Storage Requirements for 10M Embeddings — a full worked capacity projection.

Up: pgvector Architecture & Vector Fundamentals

pgvector Storage Overhead Analysis: Heap, Index, and Bloat Diagnostics

Architectural Divergence & Trade-offs #

Parameter Space & Diagnostic Workflow #

Step-by-Step Implementation #

Validation & Recall Testing #

Failure Modes & Gotchas #

Monitoring & Alerting Hooks #

FAQ #

Related #