BlogVideos

Vector Databases: What They Are and How They Power AI

By Jhony Vidal
February 24, 2025
3 min read
Vector Databases: What They Are and How They Power AI

Vector databases are everywhere in AI talk. But in real systems, the best results usually come from a retrieval stack, not “a vector DB alone”.

This is a rewritten and updated version of the post (late 2025). It focuses less on code and more on:

  • What vector search is (in simple terms)
  • Which algorithms make it work at scale
  • When a vector database is the right tool (and when it’s not)
  • How to build production-ready RAG when your data lives across many silos

What a vector database really is

A vector database stores embeddings: lists of numbers that represent meaning. Similar things (two paragraphs about “employee onboarding”) end up close together in vector space.

At query time, you embed the question, then run a nearest neighbor search to retrieve the most similar items.

Characteristics of Vector Databases
Characteristics of Vector Databases

The shortest mental model

  • Full-text search answers: “does the text contain the words I typed?”
  • Vector search answers: “does the text mean something similar to what I typed?”

In production, you usually want both.


How RAG uses retrieval (the modern pipeline)

RAG isn’t “put docs in a vector DB”. It’s a chain with multiple quality levers.

flowchart LR
classDef store fill:#0b1220,stroke:#334155,color:#e5e7eb;
classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef good fill:#052e1a,stroke:#22c55e,color:#dcfce7;
classDef warn fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;
subgraph Ingest["Ingest (offline)"]
D["Docs / DBs / tickets / PDFs"]:::store --> C["Chunk + clean"]:::step
C --> M["Metadata\n(source, owner, ACL, timestamps)"]:::step
C --> E["Embed"]:::step --> V["Vector index"]:::store
C --> T["Full-text index (BM25)"]:::store
end
subgraph Query["Query (online)"]
Q["User question"]:::step --> QE["Query rewrite\n(optional)"]:::step
QE --> R1["Lexical retrieve\n(BM25)"]:::step
QE --> R2["Vector retrieve\n(ANN)"]:::step
R1 --> H["Hybrid merge"]:::step
R2 --> H
H --> RR["Re-rank\n(cross-encoder)"]:::warn
RR --> CTX["Top context"]:::good --> LLM["LLM answer"]:::good
end

Key point: a vector DB is one piece. Retrieval quality depends on chunking, metadata, filters, hybrid, and reranking.


The algorithms behind vector search (the part people skip)

Exact nearest neighbor search is slow at scale. So most systems use Approximate Nearest Neighbor (ANN) indexing.

Here are the names you’ll see in production engines:

  • HNSW (Hierarchical Navigable Small World): graph-based index, often great recall/latency, can be memory heavy.
  • IVF (Inverted File Index): clusters vectors, searches within the most relevant clusters.
  • PQ (Product Quantization): compresses vectors to save memory and improve speed (trade-off: accuracy).
  • IVF+PQ: common combo in FAISS-style systems.
  • DiskANN-style approaches: focus on large-scale search where the index lives on disk/SSD.
  • ScaNN-style approaches: optimized ANN for high performance and high recall in certain setups.

Distance metrics you’ll see:

  • Cosine similarity (often with normalized vectors)
  • Dot product
  • Euclidean (L2)

Should you always use a vector database?

No. A vector database is great when:

  • Your users ask in natural language and your content is unstructured
  • Keyword search fails because synonyms and phrasing vary
  • You need semantic matching across many documents

But it is not always the best option when:

  • The question is structured (“sum revenue by month”) → use SQL/semantic models
  • The content is small enough to search with full-text + reranking
  • You need “global answers” that require connecting entities across a corpus → consider knowledge graphs / GraphRAG-style approaches

Here’s a simple chooser:

flowchart TB
classDef q fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef a fill:#052e1a,stroke:#22c55e,color:#dcfce7;
classDef n fill:#0b1220,stroke:#334155,color:#e5e7eb;
Q["What kind of question is this?"]:::q
Q -->|Mostly numbers, filters, joins| SQL["SQL / BI semantic layer"]:::a
Q -->|Find specific phrases / compliance clauses| FT["Full-text search (BM25)"]:::a
Q -->|Natural language, fuzzy matching| VS["Vector search (ANN)"]:::a
Q -->|Needs connections across many docs| KG["Knowledge graph / GraphRAG"]:::a
VS --> HY["Often best: Hybrid (BM25 + vectors) + reranker"]:::a
FT --> HY
KG --> HY

“Why not just use PostgreSQL or MongoDB?”

You can. And sometimes you should.

The practical question is not “can it store vectors?”, it’s:

  • How fast can it search at your scale?
  • Can it filter by metadata/ACL efficiently?
  • Can it handle updates without breaking latency?
  • Can you operate it reliably (backup, monitoring, cost)?

Challenges of Using Traditional Databases for AI
Challenges of Using Traditional Databases for AI

PostgreSQL (pgvector) — updated (and better than it used to be)

pgvector has improved a lot recently:

  • v0.7.0 added halfvec, sparsevec, binary vectors, quantization options, and more distance functions (useful for memory and certain workloads). (pgvector 0.7.0 release)
  • v0.8.0 improved filtered search and added iterative index scans to avoid “overfiltering” (where filtering kills recall), plus better HNSW build/search performance. (pgvector 0.8.0 release)

Practical guidance (2025):

  • For small projects and early production (tens/hundreds of thousands of vectors, modest QPS), pgvector can be a very good “single database” option.
  • If your workload becomes “search-first” (millions of vectors, high QPS, complex filters), you’ll often outgrow a single Postgres node and move to a dedicated search engine/vector DB.

Good when your data already lives in MongoDB and you want to keep one operational system. Still apply the same rules: measure recall, latency, and filtering behavior on your dataset.


Production reality: most teams have data silos

In production, your “knowledge” is spread across:

  • Docs (Notion/Confluence/Google Drive)
  • Tickets (Jira/ServiceNow)
  • Source code (GitHub)
  • Databases (Postgres, Snowflake, etc.)
  • PDFs, emails, chat logs

Your retrieval system has to unify this without creating a security nightmare.

flowchart LR
classDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;
classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;
classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;
subgraph Sources["Data silos"]
S1["Docs"]:::sys
S2["Tickets"]:::sys
S3["DBs"]:::sys
S4["Code"]:::sys
end
subgraph Pipeline["Ingestion + governance"]
Conn["Connectors\n(incremental sync)"]:::step
ACL["ACL + tenancy mapping"]:::guard
Meta["Metadata + lineage"]:::step
Redact["PII redaction\n(optional)"]:::guard
end
subgraph Retrieval["Retrieval layer"]
FT["Full-text index (BM25)"]:::sys
VX["Vector index (ANN)"]:::sys
Rerank["Reranker"]:::step
end
subgraph App["AI app"]
Policy["AuthZ check\nat query time"]:::guard
LLM["Answer + citations"]:::out
end
Sources --> Conn --> ACL --> Meta
Meta --> FT
Meta --> VX
FT --> Rerank --> Policy --> LLM
VX --> Rerank
Redact --> Meta

Best practices for multi-silo RAG

  • Treat access control as data: store document-level permissions and enforce them at query time (not only at ingest time).
  • Use hybrid retrieval (BM25 + vectors): it improves recall for exact terms (IDs, error codes) and still handles meaning.
  • Rerank your top candidates: it’s often the single biggest quality lift.
  • Track freshness: incremental sync, timestamps, and clear “last updated” behavior.
  • Log retrieval (not just answers): you need to know which docs were retrieved, filtered out, and used.
  • Measure: recall@k, nDCG, answer groundedness, latency, and cost.
  • Avoid one giant index if you have tenants/domains with different rules; shard by tenant or by domain when it helps.

Instead of a long list, here’s the easy breakdown:

  • Managed vector DBs (convenient ops): Pinecone, etc.
  • Open-source vector engines (self-host): Milvus, Weaviate, Qdrant, etc.
  • Search engines with vectors (great hybrid story): Elasticsearch/OpenSearch-style approaches
  • “One stack” databases for smaller projects: PostgreSQL + pgvector, MongoDB Atlas Vector Search
  • Libraries: FAISS (you embed it inside your service)

Which vector database solution should I choose?
Which vector database solution should I choose?


Videos to watch

Some YouTube videos don’t allow embedding (or may fail under strict privacy settings), which can show a “Video player configuration error”.

Use the direct links instead:


Keep exploring


Share

Previous Article
What AI Models to Use? Choosing the Right AI Model for Your Needs
Jhony Vidal

Jhony Vidal

Lead AI Engineer

Topics

AI Podcast
Data, AI & Automation
Research

Related Posts

Agent architectures in practice: patterns, platforms, and use cases
December 24, 2025
4 min

Legal Stuff

Privacy NoticeCookie PolicyTerms Of Use

Social Media