Hybrid Search (FTS5 + RRF)

Motivation

Pure KNN embedding search excels at finding semantically similar entities — synonyms, paraphrases, even cross-lingual matches. But it fails when users search by exact terms such as proper names, identifiers, or technical jargon. The vector representation of “FTS5” won’t reliably match a passage that literally contains “FTS5” — the model operates in meaning space, not token space.

FTS5 (BM25) is the mirror image: excellent at literal token matching but blind to semantic relationships. A search for “vector database” won’t find an entity whose observation says “embedding storage” unless the exact tokens overlap.

Hybrid search combines both signals to get the best of each world:

Method	Strength	Weakness
KNN (semantic)	Synonyms, paraphrases, cross-lingual	Exact terms, rare words, IDs
FTS5 (BM25)	Exact terms, names, IDs, jargon	Semantic understanding, synonyms
Hybrid (RRF)	Both	Slightly more complex pipeline

The Six-Step Pipeline

When you call search_semantic("FTS5 configuration"), the engine runs a six-step pipeline that branches into parallel searches, merges them, and re-ranks with limbic signals:

graph TD
    Q["① Encode query<br/>engine.encode(text, task='query')<br/>→ float32[384]"]
    Q --> KNN["② Semantic branch (KNN)<br/>sqlite-vec KNN search<br/>→ 3 × limit candidates"]
    Q --> FTS["② Full-text branch (FTS5)<br/>BM25 on name, entity_type, obs_text<br/>→ 3 × limit candidates"]

    KNN -->|"[{entity_id, distance}]"| RRF["③ RRF Merge<br/>score = Σ 1/(k + rank_i)<br/>k = 60"]
    FTS -->|"[{entity_id, rank}]"| RRF

    RRF -->|"[{entity_id, rrf_score, dist?}]"| Limbic["④ Limbic Re-rank<br/>salience · temporal · cooc<br/>→ rank_hybrid_candidates()"]
    Limbic --> Hydrate["⑤ Hydrate entities<br/>get_entity_by_id()<br/>+ get_observations()"]
    Hydrate --> Track["⑥ Track access signals<br/>record_access()<br/>+ record_co_occurrences()"]
    Track --> Output["Output: [{name, entityType, observations,<br/>limbic_score, scoring, distance, rrf_score}]"]

Step-by-step breakdown

① Encode query — The query text is prefixed with "query: " and encoded into a 384-dimensional vector by the ONNX embedding engine. This is the same encoding pipeline used for Semantic Search, operating in query mode.

② Parallel retrieval — Two independent searches run as separate branches against the same query:

Semantic (KNN): The query vector is compared against all entity embeddings stored in sqlite-vec. To leave room for re-ranking, the engine retrieves 3 × limit candidates (over-retrieval). The distance metric is cosine: d = 1 - cos(A, B).
Full-text (FTS5): The raw query text is searched against a BM25 index covering entity names, types, and observation content. Also retrieves 3 × limit candidates.

③ RRF Merge — Results from both branches are merged using Reciprocal Rank Fusion (RRF). This step produces a unified ranking where entities found by both methods receive a boost. See Reciprocal Rank Fusion below.

④ Limbic Re-rank — The merged candidates are scored by the Limbic System, which applies salience, temporal decay, and co-occurrence boosts. In hybrid mode this uses rank_hybrid_candidates() instead of rank_candidates().

⑤ Hydrate entities — The top-K entity IDs are resolved into full entities with their names, types, and observations from the SQLite database.

⑥ Track access signals — After building the response, the engine records which entities were accessed and which appeared together (co-occurrences). This is best-effort and does not affect the returned results, but feeds future limbic scoring.

FTS5: Full-Text Index

The full-text index is a SQLite FTS5 virtual table:

CREATE VIRTUAL TABLE IF NOT EXISTS entity_fts
USING fts5(name, entity_type, obs_text, tokenize="unicode61");

Column	Type	Description
`name`	`TEXT`	Entity name — directly searchable
`entity_type`	`TEXT`	Entity type — enables type-based queries (“Project”, “Session”)
`obs_text`	`TEXT`	All observations concatenated with `" \| "` separator
`rowid`	`INTEGER`	Implicit — corresponds to `entities.id` for JOIN-free lookups

Tokenizer: unicode61 — correctly handles accented characters (é, ñ, ü) and other Unicode. This is essential for a multilingual knowledge graph.

FTS synchronization

The FTS table is maintained at the code level, not via SQLite triggers. The _sync_fts(entity_id) method reads the entity’s current state from the DB and executes INSERT OR REPLACE in the FTS table:

Operation	Method invoked	Behavior
`upsert_entity`	`_sync_fts(entity_id)`	INSERT OR REPLACE with current data
`add_observations`	`_sync_fts(entity_id)`	Rebuilds `obs_text` from DB
`delete_observations`	`_sync_fts(entity_id)`	Rebuilds `obs_text` from DB
`delete_entities`	Direct DELETE by rowid	Manual deletion (FTS5 doesn’t support CASCADE)
`init_db` (backfill)	`_backfill_fts()`	Populates from existing entities if FTS is empty

Reciprocal Rank Fusion (RRF)

The fusion of rankings uses the standard RRF formula:

rrf_score(d) = Σ_{i ∈ rankings} 1 / (k + rank_i(d))

Where:

rank_i(d) = 1-based position of document d in ranking i
k = smoothing constant (RRF_K = 60, standard value from the original paper)

Why it works: RRF doesn’t require scores to be comparable across systems. KNN produces cosine distances and FTS5 produces BM25 ranks — different scales, different distributions. RRF only cares about position in each ranking, making it ideal for heterogeneous retrieval.

How RRF merges two rankings

Scenario	Effect
Entity in both rankings	Receives score from both → boosted to the top
Entity in KNN only	Receives partial score from its KNN rank
Entity in FTS5 only	Receives partial score from its BM25 rank

Example

Given limit = 10 → 3 × 10 = 30 candidates per branch:

KNN rank	Entity	FTS5 rank	RRF score
1	Entity A	3	1/(60+1) + 1/(60+3) = 0.0322
2	Entity B	—	1/(60+2) = 0.0161
—	Entity C	1	1/(60+1) = 0.0164
5	Entity D	2	1/(60+5) + 1/(60+2) = 0.0315

Entity A appears in both rankings at positions 1 and 3, receiving the highest combined score. Entity C appears only in FTS5 but at rank 1, so it edges out Entity B (KNN rank 2).

def reciprocal_rank_fusion(
    semantic_results: list[dict],  # [{entity_id, distance}] ordered by distance
    fts_results: list[dict],       # [{entity_id, rank}] ordered by BM25 rank
    k: int = RRF_K,               # 60
) -> list[dict]:
    # Returns [{entity_id, rrf_score, distance | None}] sorted by rrf_score desc

Hybrid Scoring: rank_hybrid_candidates()

When hybrid search is active, the Limbic System uses rank_hybrid_candidates() instead of rank_candidates(). The key difference is how base relevance is calculated:

Entity source	`base_relevance`	Source
KNN + FTS (both)	`max(0, 1 - distance)`	Cosine similarity from KNN
KNN only	`max(0, 1 - distance)`	Cosine similarity from KNN
FTS only (no KNN)	`0.2 + 0.6 × norm_rrf`	RRF normalized to [0.2, 0.8]

Why different formulas?

Entities found only by FTS5 have no KNN distance (distance = None). Without a vector similarity signal, we can’t use the cosine formula. Instead, their RRF score is normalized min-max to the range [0.2, 0.8]:

norm_rrf = (rrf_score - rrf_min) / rrf_range
base_relevance = 0.2 + 0.6 * norm_rrf  # → [0.2, 0.8]

The bounds prevent FTS-only entities from dominating (ceiling at 0.8) or being buried (floor at 0.2). The limbic components then apply on top:

limbic_score = base_relevance × (1 + β_sal × importance) × temporal × (1 + γ × cooc_boost)

This is the same composite formula used in pure semantic mode — only base_relevance changes.

Hybrid vs. Pure Semantic

The pipeline automatically chooses between hybrid and pure semantic mode based on FTS5 availability:

graph TD
    Search["search_semantic(query, limit)"]
    Search --> FTSCheck{"FTS5 has results?"}
    FTSCheck -->|Yes| Hybrid["Hybrid mode<br/>rank_hybrid_candidates()<br/>+ rrf_score in output"]
    FTSCheck -->|No| Pure["Pure semantic mode<br/>rank_candidates()<br/>no rrf_score in output"]

Aspect	Hybrid mode	Pure semantic mode
Triggered when	FTS5 returns ≥1 result	FTS5 returns 0 results or is unavailable
Scoring function	`rank_hybrid_candidates()`	`rank_candidates()`
Base relevance	KNN cosine or normalized RRF	Always `max(0, 1 - distance)`
`rrf_score` field	Present in every result	Absent
Best for	Mixed queries (semantic + exact terms)	Conceptual queries, synonyms

Tuneable Parameters

All constants live in src/mcp_memory/scoring.py as module-level variables:

Constant	Default	Purpose
`EXPANSION_FACTOR`	`3`	KNN over-retrieval multiplier. If `limit=10`, 30 candidates are retrieved for re-ranking
`RRF_K`	`60`	RRF smoothing constant. Standard value from the original paper. Higher values smooth rank differences; lower values amplify top positions

These two constants directly control the hybrid search behavior:

EXPANSION_FACTOR: affects how many candidates each branch retrieves before merging. Higher values improve recall at the cost of computation. The re-ranking step then selects the best limit results.
RRF_K: controls how much RRF rewards top positions vs. lower ones. With k=60, the difference between rank 1 and rank 2 is 1/61 - 1/62 = 0.000265 — small but cumulative across rankings.

Output

Hybrid mode (KNN + FTS5)

Each result includes the rrf_score field:

{
  "results": [{
    "name": "CachorroSpace",
    "entityType": "Project",
    "observations": ["Built with Astro Starlight", "Accent: teal (#2dd4bf)"],
    "limbic_score": 0.67,
    "scoring": {
      "importance": 0.85,
      "temporal_factor": 0.99,
      "cooc_boost": 1.23
    },
    "distance": 0.42,
    "rrf_score": 0.018542
  }]
}

Pure semantic mode (KNN only)

The rrf_score field is absent:

{
  "results": [{
    "name": "CachorroSpace",
    "entityType": "Project",
    "observations": ["Built with Astro Starlight", "Accent: teal (#2dd4bf)"],
    "limbic_score": 0.52,
    "scoring": {
      "importance": 0.70,
      "temporal_factor": 0.95,
      "cooc_boost": 0.80
    },
    "distance": 0.35
  }]
}

The presence or absence of rrf_score is the only structural difference in the output — you can use it to detect which mode was used without querying the engine directly.