Can I change m or ef_construction after the HNSW index is built?

No. Both m and ef_construction are build-time only and are frozen into the graph topology at creation. Changing either requires a full REINDEX or a fresh CREATE INDEX. Only hnsw.ef_search is adjustable at query time, per session or transaction.

Why does my HNSW index get ignored and the query does a Seq Scan?

The most common causes are an operator-class mismatch (querying with against a vector_cosine_ops index), a dimension mismatch between the query literal and the column, or cost settings like random_page_cost still tuned for spinning disks. Confirm the operator and cast width, then re-run EXPLAIN.

Should I use CREATE INDEX CONCURRENTLY for HNSW on a production table?

Yes. A plain CREATE INDEX holds an exclusive lock for the entire build, which can be hours on a large embedding column. CONCURRENTLY builds without blocking DML at the cost of a second table pass and higher failure sensitivity to long-running locks, so run it in a low-traffic window with retry logic.

How do I raise recall without rebuilding an HNSW index?

Raise hnsw.ef_search first, since it is a free query-time change that trades latency for recall. If recall plateaus below target even at a high ef_search, the graph itself is too sparse and you must rebuild with a larger ef_construction or m.

Step-by-Step HNSW Index Creation for Production Workloads

Building a Hierarchical Navigable Small World (HNSW) index on a live pgvector table is a calibrated, five-step procedure — not a single DDL statement. This page walks through the exact sequence to validate the embedding schema, choose build-time parameters, run the build without blocking traffic, confirm the planner actually uses the index, and tune query-time behaviour for a production Approximate Nearest Neighbor (ANN) workload.

Up: HNSW vs IVFFlat Algorithm Selection

HNSW constructs a multi-layer proximity graph that trades predictable memory overhead for sub-linear ANN query latency, which makes it the default choice for read-heavy semantic search, retrieval-augmented generation, and recommendation serving. This guide assumes you have already settled the algorithm question using the HNSW vs IVFFlat algorithm selection framework and are committed to HNSW; if you are still weighing memory footprint against write amplification, resolve that first, because the build procedure below assumes HNSW is the right structure for your latency and recall targets.

Prerequisites

pgvector 0.5.0+ for stable HNSW graph serialization and concurrent build support; 0.7.0+ if you plan to build on halfvec to halve the working set. Verify with SELECT extversion FROM pg_extension WHERE extname = 'vector';.
PostgreSQL 15+ for parallel index builds and pg_stat_progress_create_index phase reporting.
A superuser or table-owner role — SET maintenance_work_mem and CREATE INDEX on the target table require it.
RAM headroom: the full HNSW graph must stay resident to search efficiently. Budget for the raw vectors plus roughly m edge references per node, and keep maintenance_work_mem × (1 + max_parallel_maintenance_workers) under free RAM during the build.
A settled distance operator class — the index binds to exactly one of vector_cosine_ops, vector_l2_ops, or vector_ip_ops. Decide the metric first with cosine vs L2 distance metrics, because a mismatched operator makes the planner ignore the index entirely.

The five-phase HNSW build workflow. A failed validation in phase 4 loops back to phase 2, because m and ef_construction are immutable and can only be changed by a rebuild.

Step-by-step procedure

1. Validate the schema and dimensionality

Before executing any index creation command, validate the embedding schema against pgvector constraints. HNSW requires strictly fixed-dimensional vectors; mismatched dimensions or implicit type casting cause silent query fallbacks to sequential scans or catastrophic recall degradation. Define the table with explicit dimensionality and the metadata columns you will pre-filter on:

SQL

CREATE TABLE document_embeddings (
    id           UUID PRIMARY KEY,
    tenant_id    UUID NOT NULL,
    content_hash TEXT NOT NULL,
    embedding    vector(768) NOT NULL,
    metadata     JSONB,
    created_at   TIMESTAMPTZ DEFAULT NOW()
);

For workloads exceeding 100M rows, evaluate migrating the column to halfvec(768) (pgvector 0.7.0+) if your embedding model tolerates FP16 precision without measurable accuracy loss. This halves the index footprint, cuts WAL generation during bulk inserts, and lowers I/O pressure during graph traversal — the per-row storage math that drives this decision is worked through in pgvector storage overhead analysis, and the type trade-offs in vector data type selection. Ensure your PostgreSQL instance is compiled with SIMD support (pgvector uses AVX2/AVX-512 for distance calculations), since the CPU instruction set directly dictates build throughput and query latency.

2. Choose build-time parameters (`m`, `ef_construction`)

The core creation statement exposes two build-time knobs that fix graph topology permanently:

SQL

CREATE INDEX idx_doc_embeddings_hnsw
ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

m defines the maximum number of edges per node in the graph’s upper layers: raising it adds redundant pathways that improve recall but scales resident memory roughly linearly. ef_construction controls the size of the dynamic candidate list during the build: higher values yield a denser, more accurate graph at the cost of longer build time and more maintenance_work_mem pressure. Both are immutable after creation, so a wrong choice here means a full rebuild later. A practical heuristic is ef_construction = m * 4 for baseline accuracy and m * 8 to m * 12 for high-recall semantic search; the full calibration methodology, including recall-versus-latency sweeps, lives in optimizing m and ef_construction parameters. Provision maintenance_work_mem to at least 25% of available RAM for the build session, but cap it below 32GB to avoid allocator overhead.

3. Build without blocking with `CONCURRENTLY`

Production tables cannot tolerate the exclusive lock that a plain CREATE INDEX takes for the full build duration. Use CREATE INDEX CONCURRENTLY so DML keeps flowing:

SQL

CREATE INDEX CONCURRENTLY idx_doc_embeddings_hnsw
ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

CONCURRENTLY performs two passes over the table, needs extra temporary storage for graph construction, and is more likely to fail if a long-running transaction holds a conflicting lock. Orchestrate it during a low-traffic window and wrap the call in retry logic with exponential backoff for transient lock conflicts. The broader patterns — shadow-table builds, parallelism tuning, and recovery when a build dies mid-flight — are covered in asynchronous index build strategies; if the build times out before it finishes, follow resolving pgvector index build timeout errors.

Monitor progress from a second session while the build runs:

SQL

SELECT phase, blocks_done, blocks_total,
       tuples_done, tuples_total
FROM pg_stat_progress_create_index;
-- phases: "building index: loading tuples", "waiting for writers", ...

4. Validate that the planner uses the index

A build is only trustworthy once you confirm the planner actually chooses the HNSW structure. Run an EXPLAIN (ANALYZE, BUFFERS) against a representative query:

SQL

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, metadata
FROM document_embeddings
ORDER BY embedding <=> '[0.12, -0.45, 0.98]'::vector(768)
LIMIT 20;

Look for Index Scan using idx_doc_embeddings_hnsw in the plan and verify actual rows aligns with the LIMIT. A Seq Scan here means the index is being ignored — the three usual causes are a dimension mismatch between the query vector and the column, an operator-class mismatch (querying with <-> against a vector_cosine_ops index), or a cost setting like random_page_cost still tuned for spinning disks. Classifying these post-build failure states systematically is the subject of index validation error categorization.

5. Tune query-time behaviour and lifecycle

Build-time parameters are frozen, but query-time accuracy is governed by ef_search, adjustable per session or transaction:

SQL

SET LOCAL hnsw.ef_search = 128;

Higher ef_search raises recall at the cost of latency. Route it by workload: ef_search = 64 for real-time autocomplete, ef_search = 256 for batch recommendation scoring. Over time, heavy UPDATE/DELETE cycles accumulate dead tuples in the graph, so schedule VACUUM (ANALYZE) during off-peak hours and trigger REINDEX CONCURRENTLY once graph fragmentation (dead-tuple ratio from pg_stat_user_indexes) crosses roughly 15%. For pipelines where the embedding model itself changes, wrap drift detection into CI: if cosine similarity between old and new model outputs drops below 0.92, flag the index for a full rebuild rather than incremental updates.

Parameter reference

Name	Type	Default	Production recommendation	Notes
`m`	int	`16`	`16`–`32`	Max edges per node; build-time only. Raising it improves recall but grows resident memory and build time. Cannot change without a rebuild.
`ef_construction`	int	`64`	`128`–`256`	Build-time candidate-list size. Dominant build-cost knob; a good default is `m * 8`. Immutable after creation.
`hnsw.ef_search`	int	`40`	`64`–`256`	Query-time candidate list. Set per session/transaction; trade latency for recall by workload profile.
`maintenance_work_mem`	memory	`64MB`	`2GB`–`16GB` (≤ 32GB)	Sized to the graph working set; too low spills the build to disk and stalls it.
`max_parallel_maintenance_workers`	int	`2`	`≤ physical cores`	Speeds HNSW layer construction during the build.
operator class	ident	—	`vector_cosine_ops` (normalized text embeddings)	Must match the query operator (`<=>`, `<->`, `<#>`) or the planner falls back to `Seq Scan`.
column type	type	`vector(d)`	`halfvec(d)` above ~100M rows	`halfvec` (pgvector 0.7.0+) halves index size and WAL if FP16 precision is acceptable.

Verification

Confirm the index built cleanly and is in service — a build that failed under CONCURRENTLY can leave an invalid index behind that silently forces sequential scans:

SQL

SELECT c.relname AS index_name,
       i.indisvalid, i.indisready
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
WHERE c.relname = 'idx_doc_embeddings_hnsw';
-- expect: indisvalid = t, indisready = t

If indisvalid is f, drop the failed index (DROP INDEX CONCURRENTLY idx_doc_embeddings_hnsw;) and rerun step 3 — a concurrent build never cleans up after itself. Once valid, refresh statistics and re-run the EXPLAIN from step 4 to confirm an Index Scan, then sanity-check recall against a brute-force baseline:

PYTHON

import psycopg
import numpy as np

def recall_at_k(cur, query_vec, k=20, ef_search=128):
    lit = "[" + ",".join(map(str, query_vec)) + "]"
    # Ground truth via exact scan (ef_search disabled effect: force seq scan)
    cur.execute("SET LOCAL enable_indexscan = off;")
    cur.execute(
        "SELECT id FROM document_embeddings "
        "ORDER BY embedding <=> %s::vector LIMIT %s;", (lit, k))
    truth = {r[0] for r in cur.fetchall()}

    cur.execute("SET LOCAL enable_indexscan = on;")
    cur.execute("SET LOCAL hnsw.ef_search = %s;", (ef_search,))
    cur.execute(
        "SELECT id FROM document_embeddings "
        "ORDER BY embedding <=> %s::vector LIMIT %s;", (lit, k))
    got = {r[0] for r in cur.fetchall()}
    return len(truth & got) / k

# A healthy production HNSW index should return recall@20 ≥ 0.95

Troubleshooting

Planner falls back to Seq Scan. Almost always an operator-class or dimension mismatch. Confirm the query operator (<=> cosine, <-> L2, <#> inner product) matches the class the index was built with, and that the query literal is cast to the exact vector(d) width. Full classification is in index validation error categorization.
ERROR: memory required is N MB, maintenance_work_mem is M MB. The graph working set exceeded the build memory ceiling and would spill. Raise maintenance_work_mem for the build session, or move the column to halfvec to shrink the working set, staying under maintenance_work_mem × (1 + workers) < free RAM.
Build runs for hours then the client disconnects. A statement_timeout, pooler idle cutoff, or lock contention killed it — not a graph problem. Diagnose and recover with resolving pgvector index build timeout errors before simply disabling every timeout.
Recall is poor despite a valid index. ef_search is too low for the workload, or ef_construction/m were undersized at build time. Raise hnsw.ef_search first (it is free to change); if that plateaus below target, rebuild with a higher ef_construction per optimizing m and ef_construction parameters.
Query latency climbs over weeks. Dead tuples from heavy UPDATE/DELETE fragmenting the graph. Check the dead-tuple ratio via SELECT * FROM pg_stat_user_indexes WHERE indexrelname = 'idx_doc_embeddings_hnsw';, then VACUUM (ANALYZE) and, above ~15% fragmentation, REINDEX INDEX CONCURRENTLY idx_doc_embeddings_hnsw;.
CREATE INDEX CONCURRENTLY cannot run inside a transaction block. A migration framework wrapped the build in BEGIN/COMMIT. Run it as a standalone autocommit statement or set the framework’s disable-DDL-transaction flag for that step.

Resolving pgvector index build timeout errors — recover a build that dies before the graph materializes
Tuning IVFFlat lists for high-throughput similarity search — the parallel build procedure for the other ANN algorithm
Optimizing m and ef_construction parameters — calibrate the build knobs chosen in step 2
Index validation error categorization — classify the failure states surfaced in step 4
Up: HNSW vs IVFFlat Algorithm Selection

Step-by-Step HNSW Index Creation for Production Workloads

Prerequisites #

Step-by-step procedure #

1. Validate the schema and dimensionality #

2. Choose build-time parameters (m, ef_construction) #

3. Build without blocking with CONCURRENTLY #

4. Validate that the planner uses the index #

5. Tune query-time behaviour and lifecycle #

Parameter reference #

Verification #

Troubleshooting #

Related #