Hybrid Search (FTS5 + RRF)
Motivation
Section titled “Motivation”Pure KNN embedding search excels at finding semantically similar entities — synonyms, paraphrases, even cross-lingual matches. But it fails when users search by exact terms such as proper names, identifiers, or technical jargon. The vector representation of “FTS5” won’t reliably match a passage that literally contains “FTS5” — the model operates in meaning space, not token space.
FTS5 (BM25) is the mirror image: excellent at literal token matching but blind to semantic relationships. A search for “vector database” won’t find an entity whose observation says “embedding storage” unless the exact tokens overlap.
Hybrid search combines both signals to get the best of each world:
| Method | Strength | Weakness |
|---|---|---|
| KNN (semantic) | Synonyms, paraphrases, cross-lingual | Exact terms, rare words, IDs |
| FTS5 (BM25) | Exact terms, names, IDs, jargon | Semantic understanding, synonyms |
| Hybrid (RRF) | Both | Slightly more complex pipeline |
The Six-Step Pipeline
Section titled “The Six-Step Pipeline”When you call search_semantic("FTS5 configuration"), the engine runs a six-step pipeline that branches into parallel searches, merges them, and re-ranks with limbic signals:
graph TD Q["① Encode query<br/>engine.encode(text, task='query')<br/>→ float32[384]"] Q --> KNN["② Semantic branch (KNN)<br/>sqlite-vec KNN search<br/>→ 3 × limit candidates"] Q --> FTS["② Full-text branch (FTS5)<br/>BM25 on name, entity_type, obs_text<br/>→ 3 × limit candidates"]
KNN -->|"[{entity_id, distance}]"| RRF["③ RRF Merge<br/>score = Σ 1/(k + rank_i)<br/>k = 60"] FTS -->|"[{entity_id, rank}]"| RRF
RRF -->|"[{entity_id, rrf_score, dist?}]"| Limbic["④ Limbic Re-rank<br/>salience · temporal · cooc<br/>→ rank_hybrid_candidates()"] Limbic --> Hydrate["⑤ Hydrate entities<br/>get_entity_by_id()<br/>+ get_observations()"] Hydrate --> Track["⑥ Track access signals<br/>record_access()<br/>+ record_co_occurrences()"] Track --> Output["Output: [{name, entityType, observations,<br/>limbic_score, scoring, distance, rrf_score}]"]Step-by-step breakdown
Section titled “Step-by-step breakdown”① Encode query — The query text is prefixed with "query: " and encoded into a 384-dimensional vector by the ONNX embedding engine. This is the same encoding pipeline used for Semantic Search, operating in query mode.
② Parallel retrieval — Two independent searches run as separate branches against the same query:
- Semantic (KNN): The query vector is compared against all entity embeddings stored in sqlite-vec. To leave room for re-ranking, the engine retrieves
3 × limitcandidates (over-retrieval). The distance metric is cosine:d = 1 - cos(A, B). - Full-text (FTS5): The raw query text is searched against a BM25 index covering entity names, types, and observation content. Also retrieves
3 × limitcandidates.
③ RRF Merge — Results from both branches are merged using Reciprocal Rank Fusion (RRF). This step produces a unified ranking where entities found by both methods receive a boost. See Reciprocal Rank Fusion below.
④ Limbic Re-rank — The merged candidates are scored by the Limbic System, which applies salience, temporal decay, and co-occurrence boosts. In hybrid mode this uses rank_hybrid_candidates() instead of rank_candidates().
⑤ Hydrate entities — The top-K entity IDs are resolved into full entities with their names, types, and observations from the SQLite database.
⑥ Track access signals — After building the response, the engine records which entities were accessed and which appeared together (co-occurrences). This is best-effort and does not affect the returned results, but feeds future limbic scoring.
FTS5: Full-Text Index
Section titled “FTS5: Full-Text Index”The full-text index is a SQLite FTS5 virtual table:
CREATE VIRTUAL TABLE IF NOT EXISTS entity_ftsUSING fts5(name, entity_type, obs_text, tokenize="unicode61");| Column | Type | Description |
|---|---|---|
name | TEXT | Entity name — directly searchable |
entity_type | TEXT | Entity type — enables type-based queries (“Project”, “Session”) |
obs_text | TEXT | All observations concatenated with " | " separator |
rowid | INTEGER | Implicit — corresponds to entities.id for JOIN-free lookups |
Tokenizer: unicode61 — correctly handles accented characters (é, ñ, ü) and other Unicode. This is essential for a multilingual knowledge graph.
FTS synchronization
Section titled “FTS synchronization”The FTS table is maintained at the code level, not via SQLite triggers. The _sync_fts(entity_id) method reads the entity’s current state from the DB and executes INSERT OR REPLACE in the FTS table:
| Operation | Method invoked | Behavior |
|---|---|---|
upsert_entity | _sync_fts(entity_id) | INSERT OR REPLACE with current data |
add_observations | _sync_fts(entity_id) | Rebuilds obs_text from DB |
delete_observations | _sync_fts(entity_id) | Rebuilds obs_text from DB |
delete_entities | Direct DELETE by rowid | Manual deletion (FTS5 doesn’t support CASCADE) |
init_db (backfill) | _backfill_fts() | Populates from existing entities if FTS is empty |
Reciprocal Rank Fusion (RRF)
Section titled “Reciprocal Rank Fusion (RRF)”The fusion of rankings uses the standard RRF formula:
rrf_score(d) = Σ_{i ∈ rankings} 1 / (k + rank_i(d))Where:
rank_i(d)= 1-based position of documentdin rankingik= smoothing constant (RRF_K = 60, standard value from the original paper)
Why it works: RRF doesn’t require scores to be comparable across systems. KNN produces cosine distances and FTS5 produces BM25 ranks — different scales, different distributions. RRF only cares about position in each ranking, making it ideal for heterogeneous retrieval.
How RRF merges two rankings
Section titled “How RRF merges two rankings”| Scenario | Effect |
|---|---|
| Entity in both rankings | Receives score from both → boosted to the top |
| Entity in KNN only | Receives partial score from its KNN rank |
| Entity in FTS5 only | Receives partial score from its BM25 rank |
Example
Section titled “Example”Given limit = 10 → 3 × 10 = 30 candidates per branch:
| KNN rank | Entity | FTS5 rank | RRF score |
|---|---|---|---|
| 1 | Entity A | 3 | 1/(60+1) + 1/(60+3) = 0.0322 |
| 2 | Entity B | — | 1/(60+2) = 0.0161 |
| — | Entity C | 1 | 1/(60+1) = 0.0164 |
| 5 | Entity D | 2 | 1/(60+5) + 1/(60+2) = 0.0315 |
Entity A appears in both rankings at positions 1 and 3, receiving the highest combined score. Entity C appears only in FTS5 but at rank 1, so it edges out Entity B (KNN rank 2).
def reciprocal_rank_fusion( semantic_results: list[dict], # [{entity_id, distance}] ordered by distance fts_results: list[dict], # [{entity_id, rank}] ordered by BM25 rank k: int = RRF_K, # 60) -> list[dict]: # Returns [{entity_id, rrf_score, distance | None}] sorted by rrf_score descHybrid Scoring: rank_hybrid_candidates()
Section titled “Hybrid Scoring: rank_hybrid_candidates()”When hybrid search is active, the Limbic System uses rank_hybrid_candidates() instead of rank_candidates(). The key difference is how base relevance is calculated:
| Entity source | base_relevance | Source |
|---|---|---|
| KNN + FTS (both) | max(0, 1 - distance) | Cosine similarity from KNN |
| KNN only | max(0, 1 - distance) | Cosine similarity from KNN |
| FTS only (no KNN) | 0.2 + 0.6 × norm_rrf | RRF normalized to [0.2, 0.8] |
Why different formulas?
Section titled “Why different formulas?”Entities found only by FTS5 have no KNN distance (distance = None). Without a vector similarity signal, we can’t use the cosine formula. Instead, their RRF score is normalized min-max to the range [0.2, 0.8]:
norm_rrf = (rrf_score - rrf_min) / rrf_rangebase_relevance = 0.2 + 0.6 * norm_rrf # → [0.2, 0.8]The bounds prevent FTS-only entities from dominating (ceiling at 0.8) or being buried (floor at 0.2). The limbic components then apply on top:
limbic_score = base_relevance × (1 + β_sal × importance) × temporal × (1 + γ × cooc_boost)This is the same composite formula used in pure semantic mode — only base_relevance changes.
Hybrid vs. Pure Semantic
Section titled “Hybrid vs. Pure Semantic”The pipeline automatically chooses between hybrid and pure semantic mode based on FTS5 availability:
graph TD Search["search_semantic(query, limit)"] Search --> FTSCheck{"FTS5 has results?"} FTSCheck -->|Yes| Hybrid["Hybrid mode<br/>rank_hybrid_candidates()<br/>+ rrf_score in output"] FTSCheck -->|No| Pure["Pure semantic mode<br/>rank_candidates()<br/>no rrf_score in output"]| Aspect | Hybrid mode | Pure semantic mode |
|---|---|---|
| Triggered when | FTS5 returns ≥1 result | FTS5 returns 0 results or is unavailable |
| Scoring function | rank_hybrid_candidates() | rank_candidates() |
| Base relevance | KNN cosine or normalized RRF | Always max(0, 1 - distance) |
rrf_score field | Present in every result | Absent |
| Best for | Mixed queries (semantic + exact terms) | Conceptual queries, synonyms |
Tuneable Parameters
Section titled “Tuneable Parameters”All constants live in src/mcp_memory/scoring.py as module-level variables:
| Constant | Default | Purpose |
|---|---|---|
EXPANSION_FACTOR | 3 | KNN over-retrieval multiplier. If limit=10, 30 candidates are retrieved for re-ranking |
RRF_K | 60 | RRF smoothing constant. Standard value from the original paper. Higher values smooth rank differences; lower values amplify top positions |
These two constants directly control the hybrid search behavior:
EXPANSION_FACTOR: affects how many candidates each branch retrieves before merging. Higher values improve recall at the cost of computation. The re-ranking step then selects the bestlimitresults.RRF_K: controls how much RRF rewards top positions vs. lower ones. Withk=60, the difference between rank 1 and rank 2 is1/61 - 1/62 = 0.000265— small but cumulative across rankings.
Output
Section titled “Output”Hybrid mode (KNN + FTS5)
Section titled “Hybrid mode (KNN + FTS5)”Each result includes the rrf_score field:
{ "results": [{ "name": "CachorroSpace", "entityType": "Project", "observations": ["Built with Astro Starlight", "Accent: teal (#2dd4bf)"], "limbic_score": 0.67, "scoring": { "importance": 0.85, "temporal_factor": 0.99, "cooc_boost": 1.23 }, "distance": 0.42, "rrf_score": 0.018542 }]}Pure semantic mode (KNN only)
Section titled “Pure semantic mode (KNN only)”The rrf_score field is absent:
{ "results": [{ "name": "CachorroSpace", "entityType": "Project", "observations": ["Built with Astro Starlight", "Accent: teal (#2dd4bf)"], "limbic_score": 0.52, "scoring": { "importance": 0.70, "temporal_factor": 0.95, "cooc_boost": 0.80 }, "distance": 0.35 }]}The presence or absence of rrf_score is the only structural difference in the output — you can use it to detect which mode was used without querying the engine directly.