Skip to content

Hybrid Search (FTS5 + RRF)

Pure KNN embedding search excels at finding semantically similar entities — synonyms, paraphrases, even cross-lingual matches. But it fails when users search by exact terms such as proper names, identifiers, or technical jargon. The vector representation of “FTS5” won’t reliably match a passage that literally contains “FTS5” — the model operates in meaning space, not token space.

FTS5 (BM25) is the mirror image: excellent at literal token matching but blind to semantic relationships. A search for “vector database” won’t find an entity whose observation says “embedding storage” unless the exact tokens overlap.

Hybrid search combines both signals to get the best of each world:

MethodStrengthWeakness
KNN (semantic)Synonyms, paraphrases, cross-lingualExact terms, rare words, IDs
FTS5 (BM25)Exact terms, names, IDs, jargonSemantic understanding, synonyms
Hybrid (RRF)BothSlightly more complex pipeline

When you call search_semantic("FTS5 configuration"), the engine runs a six-step pipeline that branches into parallel searches, merges them, and re-ranks with limbic signals:

graph TD
Q["① Encode query<br/>engine.encode(text, task='query')<br/>→ float32[384]"]
Q --> KNN["② Semantic branch (KNN)<br/>sqlite-vec KNN search<br/>→ 3 × limit candidates"]
Q --> FTS["② Full-text branch (FTS5)<br/>BM25 on name, entity_type, obs_text<br/>→ 3 × limit candidates"]
KNN -->|"[{entity_id, distance}]"| RRF["③ RRF Merge<br/>score = Σ 1/(k + rank_i)<br/>k = 60"]
FTS -->|"[{entity_id, rank}]"| RRF
RRF -->|"[{entity_id, rrf_score, dist?}]"| Limbic["④ Limbic Re-rank<br/>salience · temporal · cooc<br/>→ rank_hybrid_candidates()"]
Limbic --> Hydrate["⑤ Hydrate entities<br/>get_entity_by_id()<br/>+ get_observations()"]
Hydrate --> Track["⑥ Track access signals<br/>record_access()<br/>+ record_co_occurrences()"]
Track --> Output["Output: [{name, entityType, observations,<br/>limbic_score, scoring, distance, rrf_score}]"]

① Encode query — The query text is prefixed with "query: " and encoded into a 384-dimensional vector by the ONNX embedding engine. This is the same encoding pipeline used for Semantic Search, operating in query mode.

② Parallel retrieval — Two independent searches run as separate branches against the same query:

  • Semantic (KNN): The query vector is compared against all entity embeddings stored in sqlite-vec. To leave room for re-ranking, the engine retrieves 3 × limit candidates (over-retrieval). The distance metric is cosine: d = 1 - cos(A, B).
  • Full-text (FTS5): The raw query text is searched against a BM25 index covering entity names, types, and observation content. Also retrieves 3 × limit candidates.

③ RRF Merge — Results from both branches are merged using Reciprocal Rank Fusion (RRF). This step produces a unified ranking where entities found by both methods receive a boost. See Reciprocal Rank Fusion below.

④ Limbic Re-rank — The merged candidates are scored by the Limbic System, which applies salience, temporal decay, and co-occurrence boosts. In hybrid mode this uses rank_hybrid_candidates() instead of rank_candidates().

⑤ Hydrate entities — The top-K entity IDs are resolved into full entities with their names, types, and observations from the SQLite database.

⑥ Track access signals — After building the response, the engine records which entities were accessed and which appeared together (co-occurrences). This is best-effort and does not affect the returned results, but feeds future limbic scoring.

The full-text index is a SQLite FTS5 virtual table:

CREATE VIRTUAL TABLE IF NOT EXISTS entity_fts
USING fts5(name, entity_type, obs_text, tokenize="unicode61");
ColumnTypeDescription
nameTEXTEntity name — directly searchable
entity_typeTEXTEntity type — enables type-based queries (“Project”, “Session”)
obs_textTEXTAll observations concatenated with " | " separator
rowidINTEGERImplicit — corresponds to entities.id for JOIN-free lookups

Tokenizer: unicode61 — correctly handles accented characters (é, ñ, ü) and other Unicode. This is essential for a multilingual knowledge graph.

The FTS table is maintained at the code level, not via SQLite triggers. The _sync_fts(entity_id) method reads the entity’s current state from the DB and executes INSERT OR REPLACE in the FTS table:

OperationMethod invokedBehavior
upsert_entity_sync_fts(entity_id)INSERT OR REPLACE with current data
add_observations_sync_fts(entity_id)Rebuilds obs_text from DB
delete_observations_sync_fts(entity_id)Rebuilds obs_text from DB
delete_entitiesDirect DELETE by rowidManual deletion (FTS5 doesn’t support CASCADE)
init_db (backfill)_backfill_fts()Populates from existing entities if FTS is empty

The fusion of rankings uses the standard RRF formula:

rrf_score(d) = Σ_{i ∈ rankings} 1 / (k + rank_i(d))

Where:

  • rank_i(d) = 1-based position of document d in ranking i
  • k = smoothing constant (RRF_K = 60, standard value from the original paper)

Why it works: RRF doesn’t require scores to be comparable across systems. KNN produces cosine distances and FTS5 produces BM25 ranks — different scales, different distributions. RRF only cares about position in each ranking, making it ideal for heterogeneous retrieval.

ScenarioEffect
Entity in both rankingsReceives score from both → boosted to the top
Entity in KNN onlyReceives partial score from its KNN rank
Entity in FTS5 onlyReceives partial score from its BM25 rank

Given limit = 103 × 10 = 30 candidates per branch:

KNN rankEntityFTS5 rankRRF score
1Entity A31/(60+1) + 1/(60+3) = 0.0322
2Entity B1/(60+2) = 0.0161
Entity C11/(60+1) = 0.0164
5Entity D21/(60+5) + 1/(60+2) = 0.0315

Entity A appears in both rankings at positions 1 and 3, receiving the highest combined score. Entity C appears only in FTS5 but at rank 1, so it edges out Entity B (KNN rank 2).

def reciprocal_rank_fusion(
semantic_results: list[dict], # [{entity_id, distance}] ordered by distance
fts_results: list[dict], # [{entity_id, rank}] ordered by BM25 rank
k: int = RRF_K, # 60
) -> list[dict]:
# Returns [{entity_id, rrf_score, distance | None}] sorted by rrf_score desc

When hybrid search is active, the Limbic System uses rank_hybrid_candidates() instead of rank_candidates(). The key difference is how base relevance is calculated:

Entity sourcebase_relevanceSource
KNN + FTS (both)max(0, 1 - distance)Cosine similarity from KNN
KNN onlymax(0, 1 - distance)Cosine similarity from KNN
FTS only (no KNN)0.2 + 0.6 × norm_rrfRRF normalized to [0.2, 0.8]

Entities found only by FTS5 have no KNN distance (distance = None). Without a vector similarity signal, we can’t use the cosine formula. Instead, their RRF score is normalized min-max to the range [0.2, 0.8]:

norm_rrf = (rrf_score - rrf_min) / rrf_range
base_relevance = 0.2 + 0.6 * norm_rrf # → [0.2, 0.8]

The bounds prevent FTS-only entities from dominating (ceiling at 0.8) or being buried (floor at 0.2). The limbic components then apply on top:

limbic_score = base_relevance × (1 + β_sal × importance) × temporal × (1 + γ × cooc_boost)

This is the same composite formula used in pure semantic mode — only base_relevance changes.

The pipeline automatically chooses between hybrid and pure semantic mode based on FTS5 availability:

graph TD
Search["search_semantic(query, limit)"]
Search --> FTSCheck{"FTS5 has results?"}
FTSCheck -->|Yes| Hybrid["Hybrid mode<br/>rank_hybrid_candidates()<br/>+ rrf_score in output"]
FTSCheck -->|No| Pure["Pure semantic mode<br/>rank_candidates()<br/>no rrf_score in output"]
AspectHybrid modePure semantic mode
Triggered whenFTS5 returns ≥1 resultFTS5 returns 0 results or is unavailable
Scoring functionrank_hybrid_candidates()rank_candidates()
Base relevanceKNN cosine or normalized RRFAlways max(0, 1 - distance)
rrf_score fieldPresent in every resultAbsent
Best forMixed queries (semantic + exact terms)Conceptual queries, synonyms

All constants live in src/mcp_memory/scoring.py as module-level variables:

ConstantDefaultPurpose
EXPANSION_FACTOR3KNN over-retrieval multiplier. If limit=10, 30 candidates are retrieved for re-ranking
RRF_K60RRF smoothing constant. Standard value from the original paper. Higher values smooth rank differences; lower values amplify top positions

These two constants directly control the hybrid search behavior:

  • EXPANSION_FACTOR: affects how many candidates each branch retrieves before merging. Higher values improve recall at the cost of computation. The re-ranking step then selects the best limit results.
  • RRF_K: controls how much RRF rewards top positions vs. lower ones. With k=60, the difference between rank 1 and rank 2 is 1/61 - 1/62 = 0.000265 — small but cumulative across rankings.

Each result includes the rrf_score field:

{
"results": [{
"name": "CachorroSpace",
"entityType": "Project",
"observations": ["Built with Astro Starlight", "Accent: teal (#2dd4bf)"],
"limbic_score": 0.67,
"scoring": {
"importance": 0.85,
"temporal_factor": 0.99,
"cooc_boost": 1.23
},
"distance": 0.42,
"rrf_score": 0.018542
}]
}

The rrf_score field is absent:

{
"results": [{
"name": "CachorroSpace",
"entityType": "Project",
"observations": ["Built with Astro Starlight", "Accent: teal (#2dd4bf)"],
"limbic_score": 0.52,
"scoring": {
"importance": 0.70,
"temporal_factor": 0.95,
"cooc_boost": 0.80
},
"distance": 0.35
}]
}

The presence or absence of rrf_score is the only structural difference in the output — you can use it to detect which mode was used without querying the engine directly.