Vector databases are everywhere in AI talk. But in real systems, the best results usually come from a retrieval stack, not “a vector DB alone”.
This is a rewritten and updated version of the post (late 2025). It focuses less on code and more on:
A vector database stores embeddings: lists of numbers that represent meaning. Similar things (two paragraphs about “employee onboarding”) end up close together in vector space.
At query time, you embed the question, then run a nearest neighbor search to retrieve the most similar items.
In production, you usually want both.
RAG isn’t “put docs in a vector DB”. It’s a chain with multiple quality levers.
flowchart LRclassDef store fill:#0b1220,stroke:#334155,color:#e5e7eb;classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;classDef good fill:#052e1a,stroke:#22c55e,color:#dcfce7;classDef warn fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;subgraph Ingest["Ingest (offline)"]D["Docs / DBs / tickets / PDFs"]:::store --> C["Chunk + clean"]:::stepC --> M["Metadata\n(source, owner, ACL, timestamps)"]:::stepC --> E["Embed"]:::step --> V["Vector index"]:::storeC --> T["Full-text index (BM25)"]:::storeendsubgraph Query["Query (online)"]Q["User question"]:::step --> QE["Query rewrite\n(optional)"]:::stepQE --> R1["Lexical retrieve\n(BM25)"]:::stepQE --> R2["Vector retrieve\n(ANN)"]:::stepR1 --> H["Hybrid merge"]:::stepR2 --> HH --> RR["Re-rank\n(cross-encoder)"]:::warnRR --> CTX["Top context"]:::good --> LLM["LLM answer"]:::goodend
Key point: a vector DB is one piece. Retrieval quality depends on chunking, metadata, filters, hybrid, and reranking.
Exact nearest neighbor search is slow at scale. So most systems use Approximate Nearest Neighbor (ANN) indexing.
Here are the names you’ll see in production engines:
Distance metrics you’ll see:
No. A vector database is great when:
But it is not always the best option when:
Here’s a simple chooser:
flowchart TBclassDef q fill:#111827,stroke:#6366f1,color:#eef2ff;classDef a fill:#052e1a,stroke:#22c55e,color:#dcfce7;classDef n fill:#0b1220,stroke:#334155,color:#e5e7eb;Q["What kind of question is this?"]:::qQ -->|Mostly numbers, filters, joins| SQL["SQL / BI semantic layer"]:::aQ -->|Find specific phrases / compliance clauses| FT["Full-text search (BM25)"]:::aQ -->|Natural language, fuzzy matching| VS["Vector search (ANN)"]:::aQ -->|Needs connections across many docs| KG["Knowledge graph / GraphRAG"]:::aVS --> HY["Often best: Hybrid (BM25 + vectors) + reranker"]:::aFT --> HYKG --> HY
You can. And sometimes you should.
The practical question is not “can it store vectors?”, it’s:
pgvector) — updated (and better than it used to be)pgvector has improved a lot recently:
halfvec, sparsevec, binary vectors, quantization options, and more distance functions (useful for memory and certain workloads). (pgvector 0.7.0 release)Practical guidance (2025):
pgvector can be a very good “single database” option.Good when your data already lives in MongoDB and you want to keep one operational system. Still apply the same rules: measure recall, latency, and filtering behavior on your dataset.
In production, your “knowledge” is spread across:
Your retrieval system has to unify this without creating a security nightmare.
flowchart LRclassDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;subgraph Sources["Data silos"]S1["Docs"]:::sysS2["Tickets"]:::sysS3["DBs"]:::sysS4["Code"]:::sysendsubgraph Pipeline["Ingestion + governance"]Conn["Connectors\n(incremental sync)"]:::stepACL["ACL + tenancy mapping"]:::guardMeta["Metadata + lineage"]:::stepRedact["PII redaction\n(optional)"]:::guardendsubgraph Retrieval["Retrieval layer"]FT["Full-text index (BM25)"]:::sysVX["Vector index (ANN)"]:::sysRerank["Reranker"]:::stependsubgraph App["AI app"]Policy["AuthZ check\nat query time"]:::guardLLM["Answer + citations"]:::outendSources --> Conn --> ACL --> MetaMeta --> FTMeta --> VXFT --> Rerank --> Policy --> LLMVX --> RerankRedact --> Meta
Instead of a long list, here’s the easy breakdown:
pgvector, MongoDB Atlas Vector Search
Some YouTube videos don’t allow embedding (or may fail under strict privacy settings), which can show a “Video player configuration error”.
Use the direct links instead:
Legal Stuff
