How to Choose Between Cosine and L2 for Semantic Search

This page gives you a deterministic procedure for deciding whether to index a pgvector column with vector_cosine_ops (<=>) or vector_l2_ops (<->) for a semantic search workload. It resolves the choice from three measurable inputs — your embedding model’s normalization contract, the L2-norm distribution of your corpus, and your index and latency budget — so the metric becomes a locked infrastructure decision rather than a recall regression you discover weeks after launch.

Up: Cosine vs L2 Distance Metrics

Prerequisites

pgvector 0.5.0+ — required for HNSW and for stable operator-class behaviour under concurrent queries. Confirm with SELECT extversion FROM pg_extension WHERE extname = 'vector';.
PostgreSQL 15+ — needed for parallel index builds and pg_stat_progress_create_index reporting during the metric bake-off below.
A representative sample — at least 10,000 production embeddings loaded into a table (or available as a NumPy array) so the norm-distribution diagnostic is statistically meaningful.
A ground-truth label set — a few hundred query/relevant-document pairs to measure recall@K per metric; without it you are choosing blind.
Python 3.9+ with numpy (and optionally psycopg or asyncpg) for the distribution and validation snippets.
A settled vector column type — decide between vector and halfvec first with vector data type selection, because the operator class must match the column type you commit to.

Step-by-step procedure

1. Diagnose the embedding distribution and model contract

Before configuring any operator class, inspect the raw output of your embedding model. Modern transformer encoders (OpenAI text-embedding-3, Cohere embed, SentenceTransformers) typically emit L2-normalized vectors by design. When vectors are unit-normalized, cosine distance and L2 distance rank results identically — they are monotonically equivalent via ||u - v||² = 2 - 2·cos(u, v) — so the choice collapses to an index-performance question rather than a recall question.

If instead you run legacy models, domain-finetuned encoders, or raw mean-pooled token outputs, magnitude often carries semantic weight. L2 preserves absolute vector length differences; cosine projects every vector onto the unit hypersphere and discards magnitude entirely. Measure which regime you are in by profiling the norm distribution across your sample:

PYTHON

import numpy as np

# embeddings: (N, D) float32 array pulled from your model or table
norms = np.linalg.norm(embeddings, axis=1)
print(f"N={len(norms)}  mean={norms.mean():.4f}  std={norms.std():.4f}")
print(f"min={norms.min():.4f}  max={norms.max():.4f}")

Read the result against a simple rule: if std(||v||) < 0.05 the vectors are effectively normalized already and cosine vs L2 is a tie on recall; if std(||v||) > 0.15 magnitude is meaningful and cosine will silently degrade recall on scale-sensitive queries. Anything in between is a grey zone where you should normalize at ingestion and treat the corpus as unit-length — the approach detailed in normalizing embeddings before pgvector insertion.

2. Map the metric to a pgvector operator class

The operator class you name in CREATE INDEX is what binds the metric to the index — it is never inferred from the query. A query whose operator does not match the index operator class will not use that index at all, and the planner falls back to a sequential scan. Choose the class from Step 1:

Directional embeddings, normalized or grey-zone → cosine. Use vector_cosine_ops with the <=> operator. This is the correct default for text retrieval, retrieval-augmented generation, and recommendation.
Magnitude-bearing embeddings (std(||v|| ) > 0.15, image feature maps, sensor telemetry) → L2. Use vector_l2_ops with the <-> operator.
Already unit-normalized and latency-critical → consider inner product (vector_ip_ops, <#>), which skips the normalization division and ranks identically to cosine on unit vectors.

Declare the operator class explicitly so a schema migration never falls back to a default metric:

SQL

-- Cosine on HNSW (directional / text embeddings)
CREATE INDEX idx_semantic_cosine ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 256);

-- L2 on IVFFlat (magnitude-bearing embeddings)
CREATE INDEX idx_semantic_l2 ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 1000);

Because the metric reshapes the index geometry — HNSW builds its proximity graph on the unit hypersphere for cosine but in absolute position space for L2 — pair this decision with the HNSW vs IVFFlat algorithm selection procedure before you commit the build.

3. Weigh the build, latency, and pipeline cost

Metric choice cascades into where compute lands and how the index behaves under load. Cosine on pre-normalized vectors clusters tightly on the hypersphere, which shortens HNSW graph traversal and lowers p95 latency at concurrency. L2 on unnormalized embeddings usually needs a larger ef_search to hold the same recall, which raises memory bandwidth and WAL pressure during bulk inserts.

Cosine (pre-normalize at ingestion): shifts cost to the pipeline (numpy/torch batch normalization), cuts query-time CPU, and lets pgvector exploit contiguous, lower-variance page layouts. If you index with vector_ip_ops on those same unit vectors you shave the per-comparison division as well.
L2 (raw ingestion): simpler ingestion, but defers cost to query execution and needs careful lists / ef_search tuning. Calibrate those knobs with optimizing m and ef_construction parameters.

Normalized vectors also compress better and shrink the index, which feeds directly into capacity forecasting — quantify it with calculating pgvector storage requirements before you size the instance.

4. Validate recall against ground truth

Never promote the metric on theory alone. Run each candidate operator against exact (brute-force) nearest neighbours computed with enable_indexscan = off, and compare the index results to that ground truth:

PYTHON

import numpy as np

def recall_at_k(index_ids, truth_ids, k):
    hits = [len(set(a[:k]) & set(b[:k])) / k
            for a, b in zip(index_ids, truth_ids)]
    return float(np.mean(hits))

# index_ids: top-k ids returned by the ANN index per query
# truth_ids: top-k ids from an exact ORDER BY distance scan per query
print(f"recall@10 = {recall_at_k(index_ids, truth_ids, 10):.3f}")

Target recall@10 > 0.85 for production search. If the cosine index misses the target on grey-zone data, loop back to Step 1, normalize at ingestion, and rebuild — do not compensate by inflating ef_search, which trades latency for a problem that normalization fixes at the source.

Parameter reference

Parameter	Type	Default	Production recommendation	Notes
operator class	identifier	`vector_l2_ops` (index default)	`vector_cosine_ops` for text/RAG; `vector_l2_ops` for magnitude-bearing data	Must match the operator (`<=>` / `<->` / `<#>`) used in queries or the index is skipped.
`m` (HNSW)	int	`16`	`16`–`24` for cosine on normalized text; up to `32`–`48` for dense L2 spaces	Higher `m` raises recall and memory; cosine needs less because the hypersphere is tighter.
`ef_construction` (HNSW)	int	`64`	`200`–`400`	Bigger graph, slower build, higher recall ceiling; scale with `m`.
`ef_search` (HNSW, query)	int	`40`	`50`–`100` cosine; scale higher for L2 to match recall	Runtime knob via `SET hnsw.ef_search`; the cheapest recall/latency lever.
`lists` (IVFFlat)	int	`100`	`sqrt(N)` for cosine; `~1.5·sqrt(N)` for higher-variance L2	Run `ANALYZE` after build so centroids stabilize; too few lists collapses recall.
`probes` (IVFFlat, query)	int	`1`	`sqrt(lists)` as a starting point	`SET ivfflat.probes`; too low silently drops recall on L2 partitions.
ingest normalization	bool	off	on when `std(

Verification

Confirm the planner actually uses the metric-matched index and is not silently sequential-scanning:

SQL

SET hnsw.ef_search = 80;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM documents
ORDER BY embedding <=> '[...query vector...]'
LIMIT 10;

A healthy plan shows an Index Scan using idx_semantic_cosine node (not Seq Scan) and an operator in the sort key that matches the index’s operator class. If you built the index with vector_cosine_ops, the query must use <=>; swapping in <-> here will drop to a sequential scan even though the index exists.

Troubleshooting

Recall collapses after switching to cosine. The corpus is magnitude-bearing (std(||v||) > 0.15) and cosine discarded that signal. Re-run the Step 1 diagnostic; either revert to vector_l2_ops or, if magnitude is noise, normalize at ingestion and rebuild the cosine index.
Query ignores the index and runs Seq Scan. The query operator does not match the index operator class (e.g. <-> against a vector_cosine_ops index), or an implicit cast changed the vector dimension. Align the operator, and check the column type with \d documents.
p95 latency spikes only under concurrency. ef_search (HNSW) or probes (IVFFlat) is set too high to prop up a metric mismatch. Fix the metric first, then tune the runtime knob down; verify with EXPLAIN (ANALYZE, BUFFERS) under pgbench or k6 load.
Distance thresholds behave differently per tenant. L2 baselines drift when embedding magnitude varies by domain or language, which breaks threshold-based filters and row-level access controls. Prefer cosine for consistent cross-tenant thresholds, and isolate tenants as described in securing pgvector tables with row-level security.
IVFFlat recall is unstable right after build. Centroids were assigned before statistics settled. Run ANALYZE documents; post-build, confirm lists is at least sqrt(N), and raise probes until recall@10 clears the target.

Normalizing Embeddings Before pgvector Insertion — enforce unit length so cosine and inner product behave predictably.
Step-by-Step HNSW Index Creation for Production Workloads — build the index once the metric is settled.
Tuning IVFFlat Lists for High-Throughput Similarity Search — size lists and probes for your chosen metric.
Up: Cosine vs L2 Distance Metrics

How to Choose Between Cosine and L2 for Semantic Search

Prerequisites #

Step-by-step procedure #

1. Diagnose the embedding distribution and model contract #

2. Map the metric to a pgvector operator class #

3. Weigh the build, latency, and pipeline cost #

4. Validate recall against ground truth #

Parameter reference #

Verification #

Troubleshooting #

Related #