Building a Resilient Python Embedding Pipeline with Celery

This page walks the exact configuration that keeps a distributed embedding pipeline from losing work when an embedding provider throttles you, a worker dies mid-task, or a poison payload jams a queue. It shows how to wire Celery acknowledgment semantics, jittered retries, a dead-letter path, and idempotent upserts so millions of chunks reach a pgvector table exactly once, without index bloat or silent drops.

Up: Batch Chunking Strategies for Embeddings

Default Celery settings optimize for throughput on cheap, replayable jobs — the opposite of an embedding ingestion pipeline, where every task holds an expensive API call or a GPU forward pass and a lost message means a missing vector. Resilience here is engineered explicitly through acknowledgment mode, prefetch control, retry policy, and a composite idempotency key that survives requeues.

Prerequisites

PostgreSQL 15+ with the pgvector 0.5+ extension installed (CREATE EXTENSION vector;)
Celery 5.3+ with a durable broker — RabbitMQ 3.12+ (quorum queues) or Redis 7+ with AOF persistence
A result backend only if you consume task results; disable it (task_ignore_result = True) for fire-and-forget ingestion to spare broker memory
Worker hosts sized so each concurrency slot has headroom for one embedding batch in VRAM/RAM (a 1024-token batch at text-embedding-3-large is ~8 KB/vector before overhead)
A document_chunks table keyed on (doc_id, chunk_index) with a content_hash column — the same identity contract used upstream in batch chunking strategies for embeddings
Chunk payloads already normalized in shape (text + metadata) before they enter the queue

Step-by-Step Procedure

1. Configure the broker and worker for message safety

Message loss during a worker crash is a configuration choice, not bad luck. Set late acknowledgment so a task is only removed from the broker after it finishes, and pin prefetch to one so a single stalled embedding job cannot hoard buffered messages that another worker could drain.

PYTHON

# celeryconfig.py
broker_url = "amqp://ingest:***@rabbitmq:5672//"
task_acks_late = True
task_reject_on_worker_lost = True
worker_prefetch_multiplier = 1
task_ignore_result = True

# Route poison/exhausted tasks to a dedicated exchange
task_queues = {
    "embeddings": {"exchange": "embeddings", "routing_key": "embeddings"},
    "embeddings.dlq": {"exchange": "embeddings.dlq", "routing_key": "dlq"},
}

On RabbitMQ, declare the working queue as a quorum queue (x-queue-type = quorum) so acknowledgments survive a broker node failure. On Redis, enable AOF (appendonly yes) so in-flight messages persist across a restart.

2. Make the task idempotent before it can retry

A retried task must not create a second vector. Derive a deterministic key from content and model version so a replay overwrites in place instead of appending. This is the same guarantee that lets batch chunking strategies for embeddings survive parallel dispatch.

PYTHON

import hashlib

def idempotency_key(doc_id: str, chunk_index: int, text: str, model: str) -> str:
    content_hash = hashlib.sha256(text.encode("utf-8")).hexdigest()
    return f"{doc_id}:{chunk_index}:{model}:{content_hash[:16]}"

3. Implement the retry policy with jitter and header parsing

Embedding providers enforce RPM/TPM quotas; naive immediate retries turn a 429 into a synchronized retry storm that exhausts broker memory. Respect a server-provided Retry-After, otherwise apply capped exponential backoff with jitter. The backoff math is covered in depth in implementing exponential backoff for embedding API calls.

PYTHON

import random
from celery import shared_task
from openai import RateLimitError, APIConnectionError

@shared_task(bind=True, max_retries=6, acks_late=True,
             queue="embeddings", name="embeddings.generate")
def generate_embeddings(self, doc_id: str, chunk_index: int,
                        text: str, model: str = "text-embedding-3-large"):
    try:
        resp = client.embeddings.create(input=text, model=model)
    except RateLimitError as exc:
        retry_after = getattr(exc, "retry_after", None)
        delay = retry_after or min(2 ** self.request.retries * random.uniform(0.8, 1.2), 120)
        raise self.retry(exc=exc, countdown=delay)
    except APIConnectionError as exc:
        raise self.retry(exc=exc, countdown=15)
    vector = normalize_vector(resp.data[0].embedding)
    upsert_vector(doc_id, chunk_index, text, model, vector)
    return idempotency_key(doc_id, chunk_index, text, model)

Log the retry count and computed delay to your tracing backend (OpenTelemetry/Jaeger) so you can separate a transient network blip from sustained provider degradation.

4. Normalize and cast before persistence

Cosine similarity assumes unit-length vectors. Apply L2 normalization in Python so the database is not recomputing norms on every insert, and cast to float32 to match the pgvector storage layout and avoid implicit coercion during bulk loads. The distance-metric implications are detailed in normalizing embeddings before pgvector insertion.

PYTHON

import numpy as np

def normalize_vector(vec: list[float]) -> list[float]:
    arr = np.asarray(vec, dtype=np.float32)
    norm = np.linalg.norm(arr)
    if norm == 0.0:
        return arr.tolist()
    return (arr / norm).tolist()

5. Upsert idempotently into pgvector

Use the composite natural key as the conflict target and guard the write with a content_hash comparison so an unchanged fragment short-circuits, giving exactly-once semantics even when a task runs twice.

SQL

INSERT INTO document_chunks (doc_id, chunk_index, content_hash, model_version, embedding)
VALUES (%(doc_id)s, %(chunk_index)s, %(content_hash)s, %(model)s, %(embedding)s)
ON CONFLICT (doc_id, chunk_index) DO UPDATE
   SET embedding     = EXCLUDED.embedding,
       content_hash  = EXCLUDED.content_hash,
       model_version = EXCLUDED.model_version
 WHERE document_chunks.content_hash IS DISTINCT FROM EXCLUDED.content_hash;

6. Route exhausted and poison tasks to a dead-letter queue

Tasks that exceed max_retries or raise an unrecoverable error (malformed input, schema violation) must land in a dead-letter queue rather than vanish. Handle terminal failure explicitly and republish the payload for later reconciliation.

PYTHON

from celery.signals import task_failure

@task_failure.connect(sender=generate_embeddings)
def to_dead_letter(sender=None, task_id=None, exception=None, args=None, **kw):
    if isinstance(exception, MaxRetriesExceededError) or not _is_retryable(exception):
        app.send_task(
            "embeddings.deadletter",
            kwargs={"payload": args, "task_id": task_id, "error": repr(exception)},
            queue="embeddings.dlq",
        )

A separate reconciliation worker drains embeddings.dlq, re-validates schema, and either repairs metadata (see handling metadata drift during vector ingestion) or escalates to an alert channel.

7. Defer the ANN index during the bulk load window

Rebuilding an HNSW graph on every insert throttles write throughput. Load into an unindexed (or freshly truncated staging) table, tune the maintenance session, then build the index once at the end with CONCURRENTLY so read traffic continues. The build-timeout failure mode is covered in resolving pgvector index build timeout errors.

SQL

SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 4;

-- after the bulk ingest completes:
CREATE INDEX CONCURRENTLY idx_chunks_hnsw
    ON document_chunks USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

ANALYZE document_chunks;

Match m and ef_construction to your recall target using optimizing m and ef_construction parameters.

Parameter Reference

Parameter	Type	Default	Production recommendation	Notes
`task_acks_late`	bool	`False`	`True`	Acknowledge only after completion so a crashed worker’s task is requeued, not lost.
`task_reject_on_worker_lost`	bool	`False`	`True`	Requeue rather than mark failed when a worker process is killed (OOM, SIGKILL).
`worker_prefetch_multiplier`	int	`4`	`1`	One in-flight task per slot; prevents a stalled batch from starving peers.
`max_retries`	int	`3`	`6`	Balances recovery against queue dwell; pair with capped backoff.
`default_retry_delay`	int (s)	`180`	`2` (+ jitter)	Base for exponential backoff; the task overrides per-error.
`maintenance_work_mem`	mem	`64MB`	`2–4GB` (session)	Session-scoped for the index build; do not leave globally high.
`max_parallel_maintenance_workers`	int	`2`	= physical cores	Parallelizes HNSW/IVFFlat build during the load window.
`x-queue-type` (RabbitMQ)	enum	`classic`	`quorum`	Replicated queue survives a broker node failure.
batch `chunk_size`	int (tokens)	—	512–1024	Align to the model’s native window, not its maximum.

Verification

Confirm no chunk was dropped or duplicated by reconciling the count of distinct expected keys against persisted rows, and check that the ANN index is actually being used post-load.

SQL

-- 1. Every produced chunk landed exactly once
SELECT count(*)                              AS rows,
       count(DISTINCT (doc_id, chunk_index)) AS distinct_keys
FROM   document_chunks;   -- rows must equal distinct_keys

-- 2. Dead-letter queue drained to zero (RabbitMQ management API or:)
--    redis-cli LLEN embeddings.dlq   -> expect 0

-- 3. Queries hit the index, not a sequential scan
EXPLAIN (ANALYZE, BUFFERS)
SELECT doc_id FROM document_chunks
ORDER BY embedding <=> %(probe)s LIMIT 10;   -- expect an "Index Scan using idx_chunks_hnsw"

PYTHON

# Broker depth should trend to zero after a run
from celery import current_app
insp = current_app.control.inspect()
print(insp.active())     # tasks still executing
print(insp.reserved())   # prefetched but not started (should be small with prefetch=1)

Troubleshooting

Tasks vanish after a worker OOM-kill. The task was acknowledged before completion. Verify task_acks_late = True and task_reject_on_worker_lost = True, then confirm the broker actually requeued: rabbitmqctl list_queues name messages_unacknowledged. Unacked messages stuck non-zero means acks_late is not applied — check the worker startup log for the effective config.
Duplicate vectors after retries. The conflict target is missing or wrong. Ensure a unique constraint on (doc_id, chunk_index) exists (\d document_chunks) and that the upsert’s ON CONFLICT names it; without the constraint, ON CONFLICT raises and the insert appends.
Retry storm exhausts broker memory on a 429 spike. Backoff has no jitter or no cap. Confirm the delay uses min(2 ** retries * random.uniform(...), 120) and that a provider Retry-After header overrides it; monitor broker_queue_depth and alert above your steady-state ceiling.
Bulk load crawls / WAL floods. The HNSW index is live during insert. Build it after the load with CREATE INDEX CONCURRENTLY, raise maintenance_work_mem for the session only, and batch rows in 5,000–10,000 row COPY/multi-row INSERT groups rather than row-by-row.
Query falls back to a sequential scan post-load. Stale planner statistics — a freshly loaded table has no row estimates. Run ANALYZE document_chunks; right after promoting a batch and re-check EXPLAIN (ANALYZE) for an Index Scan node.

FAQ

How do I run both an old and a new embedding model without downtime?

Route versioned queues with Celery and keep dual columns (embedding_v1, embedding_v2) on the table. Backfill the new column with a separate task stream, validate recall parity on embedding_v2, then atomically swap the application’s query target via a database view or pooler routing rule and drop the legacy column after a validation window.

Should embedding API calls use asyncio inside a Celery task?

You can bridge a batch of provider calls with asyncio.run(...) to issue them concurrently from one worker slot without spawning OS threads, but keep the event loop confined to the task body. For the full concurrency model see async processing with Python asyncio.

RabbitMQ or Redis for the broker?

RabbitMQ with quorum queues gives stronger delivery guarantees and native dead-letter exchanges, which suits ingestion where a lost message is a missing vector. Redis is simpler and faster to operate but needs AOF persistence and manual dead-letter list handling to approach the same safety.

How large should each Celery task’s batch be?

Size the batch to the embedding model’s native context window (512–1024 tokens per fragment) and to VRAM headroom, not to the provider’s maximum request size. Smaller, length-bucketed batches retry cheaply and keep GPU memory predictable.

Why route to a dead-letter queue instead of just logging the error?

A dead-letter queue preserves the full payload for reconciliation, so a malformed or drifted record can be repaired and replayed rather than manually reconstructed. Logging alone loses the input that caused the failure.

Implementing Exponential Backoff for Embedding API Calls — the retry math this pipeline depends on.
Normalizing Embeddings Before pgvector Insertion — unit-length vectors and float32 casting on the write path.
Handling Metadata Drift During Vector Ingestion — reconciling dead-lettered payloads whose schema moved.
Resolving pgvector Index Build Timeout Errors — building the ANN index after the bulk load without stalling.

Up: Batch Chunking Strategies for Embeddings

Building a Resilient Python Embedding Pipeline with Celery

Prerequisites #

Step-by-Step Procedure #

1. Configure the broker and worker for message safety #

2. Make the task idempotent before it can retry #

3. Implement the retry policy with jitter and header parsing #

4. Normalize and cast before persistence #

5. Upsert idempotently into pgvector #

6. Route exhausted and poison tasks to a dead-letter queue #

7. Defer the ANN index during the bulk load window #

Parameter Reference #

Verification #

Troubleshooting #

FAQ #

Related #

Prerequisites

Step-by-Step Procedure

1. Configure the broker and worker for message safety

2. Make the task idempotent before it can retry

3. Implement the retry policy with jitter and header parsing

4. Normalize and cast before persistence

5. Upsert idempotently into pgvector

6. Route exhausted and poison tasks to a dead-letter queue

7. Defer the ANN index during the bulk load window

Parameter Reference

Verification

Troubleshooting

FAQ

Related