This page documents the internal data model, database schema, Pydantic validation layer, and server configuration of MCP Memory v2. It is intended for contributors who need to understand or modify the implementation.
Server Configuration
| Setting | Value |
|---|---|
| Entry point | mcp-memory → mcp_memory.server:main |
| Transport | stdio |
| Logs | stderr |
| MCP name | "memory" |
| Default DB path | ~/.config/opencode/mcp-memory/memory.db |
| Model cache | ~/.cache/mcp-memory-v2/models/ |
The database directory is created automatically on first run. Model files (ONNX + tokenizer) are downloaded via scripts/download_model.py.
Entity-Relationship Diagram
erDiagram
entities ||--o{ observations : "has"
entities ||--o{ relations : "from"
entities ||--o{ relations : "to"
entities ||--|| entity_embeddings : "1:1 (rowid)"
entities ||--|| entity_access : "tracks"
entities ||--o{ entity_access_log : "daily"
entities ||--o{ co_occurrences : "co-occurs"
entities ||--|| entity_fts : "indexed (FTS5)"
entities ||--o{ search_results : "ranked"
search_events ||--o{ search_results : "produces"
search_events ||--o{ implicit_feedback : "receives"
reflections ||--|| reflection_fts : "indexed (FTS5)"
reflections ||--|| reflection_embeddings : "vector"
entities {
INTEGER id PK
TEXT name UK
TEXT entity_type
TEXT status
TEXT created_at
TEXT updated_at
}
observations {
INTEGER id PK
INTEGER entity_id FK
TEXT content
TEXT kind
INTEGER supersedes FK
TEXT superseded_at
INTEGER similarity_flag
TEXT created_at
}
relations {
INTEGER id PK
INTEGER from_entity FK
INTEGER to_entity FK
TEXT relation_type
TEXT context
INTEGER active
TEXT ended_at
TEXT created_at
}
entity_embeddings {
INTEGER rowid PK
FLOAT embedding_384
}
entity_access {
INTEGER entity_id PK_FK
INTEGER access_count
TEXT last_access
}
entity_access_log {
INTEGER entity_id PK_FK
TEXT access_date
INTEGER access_count
}
co_occurrences {
INTEGER entity_a_id FK
INTEGER entity_b_id FK
INTEGER co_count
TEXT last_co
}
entity_fts {
INTEGER rowid PK
TEXT name
TEXT entity_type
TEXT obs_text
}
search_events {
INTEGER event_id PK
TEXT query_text
TEXT query_hash
TEXT timestamp
INTEGER treatment
INTEGER k_limit
INTEGER num_results
REAL duration_ms
TEXT engine_used
}
search_results {
INTEGER result_id PK
INTEGER event_id FK
INTEGER entity_id FK
TEXT entity_name
INTEGER rank
REAL limbic_score
REAL cosine_sim
REAL importance
REAL temporal
REAL cooc_boost
INTEGER baseline_rank
}
implicit_feedback {
INTEGER feedback_id PK
INTEGER event_id FK
INTEGER entity_id
INTEGER re_accessed
INTEGER access_delta
TEXT session_id
}
reflections {
INTEGER id PK
TEXT target_type
INTEGER target_id
TEXT author
TEXT content
TEXT mood
TEXT created_at
}
reflection_fts {
INTEGER rowid PK
TEXT content
}
reflection_embeddings {
INTEGER rowid PK
FLOAT embedding_384
}
db_metadata {
TEXT key PK
TEXT value
}
Database Schema
MCP Memory v2 stores a knowledge graph in SQLite composed of three core elements — entities, observations, and relations — extended by a fourth layer of vector embeddings (via sqlite-vec) for semantic search.
- Entities are nodes in the graph.
- Observations are facts attached to an entity.
- Relations connect two entities with a typed link.
- Embeddings project each entity into a 384-dimensional vector space for cosine similarity search.
Additional auxiliary tables power the Limbic Scoring system (entity_access, entity_access_log, co_occurrences), full-text search (entity_fts), A/B testing and implicit feedback (search_events, search_results, implicit_feedback), and narrative reflections (reflections, reflection_fts, reflection_embeddings).
entities
The primary table of the knowledge graph. Each row represents a node with a unique name.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique internal identifier |
name | TEXT | NOT NULL UNIQUE | Human-readable entity name. Business key — no two entities may share a name |
entity_type | TEXT | NOT NULL DEFAULT 'Generic' | Entity classification (e.g. Sesion, Componente, Sistema) |
status | TEXT | NOT NULL DEFAULT 'activo' | Entity lifecycle status (e.g. activo, archivado) |
created_at | TEXT | NOT NULL DEFAULT (datetime('now')) | Creation timestamp in ISO-8601 format |
updated_at | TEXT | NOT NULL DEFAULT (datetime('now')) | Last update timestamp |
observations
Facts or data points attached to an entity. An entity may have zero or many observations.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique internal identifier |
entity_id | INTEGER | NOT NULL REFERENCES entities(id) ON DELETE CASCADE | FK to the parent entity. Cascade deletes observations when the entity is removed |
content | TEXT | NOT NULL | Free-text content of the observation |
kind | TEXT | NOT NULL DEFAULT 'generic' | Observation category (e.g. generic, milestone, risk) |
supersedes | INTEGER | REFERENCES observations(id) | FK to the observation this one replaces |
superseded_at | TEXT | Timestamp when this observation was marked as superseded | |
similarity_flag | INTEGER | NOT NULL DEFAULT 0 | Flag for duplicate-detection workflows |
created_at | TEXT | NOT NULL DEFAULT (datetime('now')) | Creation timestamp |
:::note[CASCADE behavior]
ON DELETE CASCADE ensures referential integrity: deleting an entity automatically removes all its observations, preventing orphans.
:::
relations
Edges connecting two entities with a semantic relation type.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique internal identifier |
from_entity | INTEGER | NOT NULL REFERENCES entities(id) ON DELETE CASCADE | FK to the source entity |
to_entity | INTEGER | NOT NULL REFERENCES entities(id) ON DELETE CASCADE | FK to the target entity |
relation_type | TEXT | NOT NULL | Type of relationship (e.g. uses, depends_on, part_of) |
context | TEXT | Optional narrative context for the relation | |
active | INTEGER | NOT NULL DEFAULT 1 | Soft-delete flag (1 = active, 0 = ended) |
ended_at | TEXT | Timestamp when the relation was marked as ended | |
created_at | TEXT | NOT NULL DEFAULT (datetime('now')) | Creation timestamp |
:::note[Uniqueness constraint]
UNIQUE(from_entity, to_entity, relation_type) prevents duplicate relations of the same type between a given pair of entities.
:::
entity_embeddings (Virtual)
Virtual table implemented with the sqlite-vec extension (vec0). Stores the embedding vector for each entity to power semantic search.
| Column | Type | Description |
|---|---|---|
embedding | float[384] | 384-dimensional vector generated by the ONNX model. Distance metric: cosine |
rowid | INTEGER (implicit) | Corresponds to entities.id. Links the embedding to its entity |
:::caution[vec0 and CASCADE]
Virtual vec0 tables do not participate in SQLite’s ON DELETE CASCADE. When deleting an entity, embeddings must be removed manually before the entity row. The codebase handles this in delete_entities_by_names:
# 1. Delete embeddings first (vec0 has no CASCADE support)
self.db.execute(
f"DELETE FROM entity_embeddings WHERE rowid IN ({id_placeholders})", ids
)
# 2. Delete entity (CASCADE handles observations & relations)
self.db.execute(
f"DELETE FROM entities WHERE id IN ({id_placeholders})", ids
)
If you delete entities via raw SQL, clean up embeddings first to avoid orphaned records. :::
The rowid-based link enables direct JOINs without an explicit FK column:
SELECT e.name, e.entity_type
FROM entities e
JOIN entity_embeddings ee ON e.id = ee.rowid
WHERE ee.embedding MATCH ?
ORDER BY distance;
db_metadata
Auxiliary key-value table for system metadata (schema version, last migration timestamp, internal configuration).
| Column | Type | Constraints | Description |
|---|---|---|---|
key | TEXT | PRIMARY KEY | Unique metadata key |
value | TEXT | NOT NULL | Associated value |
entity_access
Support table for the Limbic Scoring system. Records how often and how recently each entity appears in search_semantic results.
| Column | Type | Constraints | Description |
|---|---|---|---|
entity_id | INTEGER | PRIMARY KEY REFERENCES entities(id) ON DELETE CASCADE | FK to the entity. One row per entity |
access_count | INTEGER | NOT NULL DEFAULT 1 | Number of times the entity appeared in semantic search results |
last_access | TEXT | NOT NULL DEFAULT (datetime('now')) | Timestamp of last access (used for temporal decay) |
co_occurrences
Support table for the Limbic Scoring system. Records how often two entities appear together in search_semantic results.
| Column | Type | Constraints | Description |
|---|---|---|---|
entity_a_id | INTEGER | NOT NULL REFERENCES entities(id) ON DELETE CASCADE | FK to the entity with the lower ID (canonical ordering) |
entity_b_id | INTEGER | NOT NULL REFERENCES entities(id) ON DELETE CASCADE | FK to the entity with the higher ID |
co_count | INTEGER | NOT NULL DEFAULT 1 | Number of recorded co-occurrences |
last_co | TEXT | NOT NULL DEFAULT (datetime('now')) | Timestamp of the last co-occurrence |
:::note[Canonical ordering]
The composite primary key (entity_a_id, entity_b_id) enforces entity_a_id < entity_b_id always. This prevents duplicate pairs like (A, B) and (B, A).
:::
entity_fts (FTS5 Virtual)
FTS5 virtual table for full-text search. Indexes entity names, types, and concatenated observation text.
| Column | Type | Description |
|---|---|---|
name | TEXT | Entity name (searchable) |
entity_type | TEXT | Entity type (searchable) |
obs_text | TEXT | All observations concatenated with " | " as separator |
rowid | INTEGER (implicit) | Corresponds to entities.id |
The tokenizer is unicode61, which correctly handles accented characters (é, ñ, ü) and other Unicode. Synchronization is code-level (not SQLite triggers): _sync_fts() is called manually in upsert_entity, add_observations, and delete_observations. On init_db(), if the FTS table is empty but entities exist, _backfill_fts() populates it.
entity_access_log
Day-level access tracking used by the consolidation pipeline. One row per entity per calendar day.
| Column | Type | Constraints | Description |
|---|---|---|---|
entity_id | INTEGER | NOT NULL REFERENCES entities(id) ON DELETE CASCADE | FK to the entity |
access_date | TEXT | NOT NULL | Calendar date of access (ISO-8601) |
access_count | INTEGER | NOT NULL DEFAULT 1 | Number of accesses on that date |
:::note[Composite key]
PRIMARY KEY (entity_id, access_date) enforces one row per entity per day.
:::
search_events
Query logging table for A/B testing and search-quality analysis.
| Column | Type | Constraints | Description |
|---|---|---|---|
event_id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique internal identifier |
query_text | TEXT | NOT NULL | Original query string |
query_hash | TEXT | NOT NULL | Hash of the query for deduplication |
timestamp | TEXT | NOT NULL DEFAULT (datetime('now')) | Query execution timestamp |
treatment | INTEGER | NOT NULL | A/B test arm identifier |
k_limit | INTEGER | NOT NULL | Maximum results requested |
num_results | INTEGER | NOT NULL | Actual number of results returned |
duration_ms | REAL | Query latency in milliseconds | |
engine_used | TEXT | NOT NULL | Search engine variant used |
search_results
Per-entity ranking data produced by each search_semantic call.
| Column | Type | Constraints | Description |
|---|---|---|---|
result_id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique internal identifier |
event_id | INTEGER | NOT NULL REFERENCES search_events(event_id) | FK to the parent search event |
entity_id | INTEGER | NOT NULL | FK to the ranked entity |
entity_name | TEXT | NOT NULL | Snapshot of entity name at query time |
rank | INTEGER | NOT NULL | Position in result list (1-based) |
limbic_score | REAL | Final blended score | |
cosine_sim | REAL | Raw cosine similarity | |
importance | REAL | Access-count component | |
temporal | REAL | Recency-decay component | |
cooc_boost | REAL | Co-occurrence boost | |
baseline_rank | INTEGER | Rank without limbic re-ranking |
:::note[Uniqueness constraint]
UNIQUE(event_id, entity_id) prevents duplicate rows for the same entity in one search.
:::
implicit_feedback
Tracks re-access events used to compute NDCG and other search-quality metrics.
| Column | Type | Constraints | Description |
|---|---|---|---|
feedback_id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique internal identifier |
event_id | INTEGER | NOT NULL REFERENCES search_events(event_id) | FK to the original search event |
entity_id | INTEGER | NOT NULL | FK to the re-accessed entity |
re_accessed | INTEGER | NOT NULL DEFAULT 0 | Whether the entity was re-accessed (1 = yes) |
access_delta | INTEGER | Seconds between search and re-access | |
session_id | TEXT | Optional session identifier |
reflections
Narrative reflections that give context and meaning to memories.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique internal identifier |
target_type | TEXT | NOT NULL | What this reflection targets (entity, session, relation, global) |
target_id | INTEGER | ID of the target entity/relation (NULL for global) | |
author | TEXT | NOT NULL DEFAULT 'sofia' | Who wrote the reflection (nolan or sofia) |
content | TEXT | NOT NULL | Free-text reflection body |
mood | TEXT | Optional mood tag (frustracion, satisfaccion, etc.) | |
created_at | TEXT | NOT NULL DEFAULT (datetime('now')) | Creation timestamp |
reflection_fts (FTS5 Virtual)
FTS5 virtual table indexing reflection content for full-text search.
| Column | Type | Description |
|---|---|---|
content | TEXT | Reflection text (searchable) |
rowid | INTEGER (implicit) | Corresponds to reflections.id |
reflection_embeddings (Virtual)
Virtual table implemented with sqlite-vec (vec0). Stores vector embeddings for reflections to enable semantic search over narrative content.
| Column | Type | Description |
|---|---|---|
embedding | float[384] | 384-dimensional vector. Distance metric: cosine |
rowid | INTEGER (implicit) | Corresponds to reflections.id. Links the embedding to its reflection |
Full SQL Schema
The complete DDL including all tables, virtual tables, and indexes:
CREATE TABLE entities (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
entity_type TEXT NOT NULL DEFAULT 'Generic',
status TEXT NOT NULL DEFAULT 'activo',
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE observations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
content TEXT NOT NULL,
kind TEXT NOT NULL DEFAULT 'generic',
supersedes INTEGER REFERENCES observations(id),
superseded_at TEXT,
similarity_flag INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE relations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
from_entity INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
to_entity INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
relation_type TEXT NOT NULL,
context TEXT,
active INTEGER NOT NULL DEFAULT 1,
ended_at TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(from_entity, to_entity, relation_type)
);
CREATE VIRTUAL TABLE entity_embeddings
USING vec0(embedding float[384] distance_metric=cosine);
CREATE TABLE db_metadata (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
-- Limbic scoring tables
CREATE TABLE entity_access (
entity_id INTEGER PRIMARY KEY REFERENCES entities(id) ON DELETE CASCADE,
access_count INTEGER NOT NULL DEFAULT 1,
last_access TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE entity_access_log (
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
access_date TEXT NOT NULL,
access_count INTEGER NOT NULL DEFAULT 1,
PRIMARY KEY (entity_id, access_date)
);
CREATE TABLE co_occurrences (
entity_a_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
entity_b_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
co_count INTEGER NOT NULL DEFAULT 1,
last_co TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (entity_a_id, entity_b_id)
);
-- Search quality tables
CREATE TABLE search_events (
event_id INTEGER PRIMARY KEY AUTOINCREMENT,
query_text TEXT NOT NULL,
query_hash TEXT NOT NULL,
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
treatment INTEGER NOT NULL,
k_limit INTEGER NOT NULL,
num_results INTEGER NOT NULL,
duration_ms REAL,
engine_used TEXT NOT NULL
);
CREATE TABLE search_results (
result_id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id INTEGER NOT NULL REFERENCES search_events(event_id),
entity_id INTEGER NOT NULL,
entity_name TEXT NOT NULL,
rank INTEGER NOT NULL,
limbic_score REAL,
cosine_sim REAL,
importance REAL,
temporal REAL,
cooc_boost REAL,
baseline_rank INTEGER,
UNIQUE(event_id, entity_id)
);
CREATE TABLE implicit_feedback (
feedback_id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id INTEGER NOT NULL REFERENCES search_events(event_id),
entity_id INTEGER NOT NULL,
re_accessed INTEGER NOT NULL DEFAULT 0,
access_delta INTEGER,
session_id TEXT
);
-- Reflection tables
CREATE TABLE reflections (
id INTEGER PRIMARY KEY AUTOINCREMENT,
target_type TEXT NOT NULL,
target_id INTEGER,
author TEXT NOT NULL DEFAULT 'sofia',
content TEXT NOT NULL,
mood TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- Full-text search (FTS5)
CREATE VIRTUAL TABLE IF NOT EXISTS entity_fts
USING fts5(name, entity_type, obs_text, tokenize="unicode61");
CREATE VIRTUAL TABLE IF NOT EXISTS reflection_fts
USING fts5(content, tokenize="unicode61");
-- Vector embeddings
CREATE VIRTUAL TABLE IF NOT EXISTS reflection_embeddings
USING vec0(embedding float[384] distance_metric=cosine);
-- Indexes
CREATE INDEX idx_entities_name ON entities(name);
CREATE INDEX idx_entities_type ON entities(entity_type);
CREATE INDEX idx_obs_entity ON observations(entity_id);
CREATE INDEX idx_obs_entity_superseded ON observations(entity_id, superseded_at);
CREATE INDEX idx_rel_from ON relations(from_entity);
CREATE INDEX idx_rel_to ON relations(to_entity);
CREATE INDEX idx_rel_type ON relations(relation_type);
CREATE INDEX idx_access_last ON entity_access(last_access);
CREATE INDEX idx_cooc_b ON co_occurrences(entity_b_id);
CREATE INDEX idx_search_events_hash ON search_events(query_hash);
CREATE INDEX idx_search_events_time ON search_events(timestamp);
CREATE INDEX idx_search_results_event ON search_results(event_id);
CREATE INDEX idx_search_results_entity ON search_results(entity_id);
CREATE INDEX idx_implicit_event ON implicit_feedback(event_id);
Indexes
| Index | Table | Column(s) | Purpose |
|---|---|---|---|
idx_entities_name | entities | name | Fast lookup by entity name |
idx_entities_type | entities | entity_type | Filter by entity type |
idx_obs_entity | observations | entity_id | Retrieve all observations for an entity |
idx_obs_entity_superseded | observations | entity_id, superseded_at | Filter active vs superseded observations |
idx_rel_from | relations | from_entity | Relations originating from an entity |
idx_rel_to | relations | to_entity | Relations targeting an entity |
idx_rel_type | relations | relation_type | Filter by relation type |
idx_access_last | entity_access | last_access | Sort by access recency (temporal decay) |
idx_cooc_b | co_occurrences | entity_b_id | Lookup co-occurrences by entity B |
idx_search_events_hash | search_events | query_hash | Deduplicate identical queries |
idx_search_events_time | search_events | timestamp | Time-range analysis of search quality |
idx_search_results_event | search_results | event_id | Retrieve all results for a search event |
idx_search_results_entity | search_results | entity_id | Find ranking history for a specific entity |
idx_implicit_event | implicit_feedback | event_id | Retrieve feedback for a search event |
Pydantic Models
Pydantic models serve a dual purpose in MCP Memory v2:
- Input validation: Each MCP tool receives JSON from the client. Models validate structure and types before touching the database.
- Output serialization: Tool responses are serialized to consistently typed JSON.
The 19 MCP tools use these models for validation and serialization.
Complete Source (models.py)
from pydantic import BaseModel, Field
class EntityInput(BaseModel):
name: str = Field(..., min_length=1)
entityType: str = Field(default="Generic")
observations: list[str] = Field(default_factory=list)
status: str = Field(default="activo")
class RelationInput(BaseModel):
from_entity: str = Field(..., alias="from")
to_entity: str = Field(..., alias="to")
relationType: str
context: str | None = None
model_config = {"populate_by_name": True}
EntityInput
Input model for creating or updating entities.
| Field | Type | Required | Default | Validation |
|---|---|---|---|---|
name | str | Yes | — | min_length=1 |
entityType | str | No | "Generic" | — |
observations | list[str] | No | [] (via factory) | New list per instance |
status | str | No | "activo" | — |
:::note[default_factory vs default=[]]
observations uses default_factory=list instead of default=[] to avoid the classic Python mutable default argument bug. With default=[], all instances would share the same list object; with default_factory, each instance gets a fresh list.
:::
RelationInput
Input model for creating relations. RelationInput validates incoming client data.
| Field | JSON Alias | Type | Required | Description |
|---|---|---|---|---|
from_entity | "from" | str | Yes | Name of the source entity |
to_entity | "to" | str | Yes | Name of the target entity |
relationType | — | str | Yes | Type of relationship |
context | — | str | No | Optional narrative context for the link |
:::note[Alias design]
from and to are Python reserved keywords, so they cannot be used as attribute names. Pydantic aliases solve this:
- In Python: access as
relation.from_entity(legal name) - In JSON: serializes as
"from"(format expected by Anthropic MCP)
populate_by_name=True enables bidirectional deserialization — both forms are accepted:
# Both work:
RelationInput(**{"from": "EntityA", "to": "EntityB", "relationType": "uses"})
RelationInput(**{"from_entity": "EntityA", "to_entity": "EntityB", "relationType": "uses"})
This guarantees compatibility with Anthropic’s MCP format (which sends "from"/"to") and with internal calls that use Python attribute names.
:::
SQLite PRAGMA Configuration
The following PRAGMAs are set on every database connection in MemoryStore:
PRAGMA journal_mode = WAL # Write without blocking reads
PRAGMA busy_timeout = 10000 # Wait 10s if locked
PRAGMA synchronous = NORMAL # Balance safety vs speed
PRAGMA cache_size = -64000 # 64 MB cache
PRAGMA temp_store = MEMORY # Temp tables in RAM
PRAGMA foreign_keys = ON # Enforce referential integrity
| PRAGMA | Value | Rationale |
|---|---|---|
journal_mode | WAL | Write-Ahead Logging allows concurrent readers while a writer is active. Readers never block writers and writers never block readers. |
busy_timeout | 10000 | Wait up to 10 seconds for lock contention before raising SQLITE_BUSY. In the MCP context (sequential tool calls), this is more than sufficient. |
synchronous | NORMAL | Safe enough for WAL mode (the WAL file is still synced), but faster than FULL which syncs the database file too. The right trade-off for a local knowledge graph. |
cache_size | -64000 | 64 MB page cache. Negative values indicate KiB. Reduces disk I/O for repeated queries. |
temp_store | MEMORY | Temporary tables and intermediate results stay in RAM. Speeds up complex queries and index rebuilding. |
foreign_keys | ON | Enforces ON DELETE CASCADE constraints. Without this pragma, SQLite silently ignores foreign key enforcement. |
WAL Mode and Concurrency
| Operation | Behavior |
|---|---|
| Concurrent reads | Allowed (WAL supports multiple simultaneous readers) |
| Writes | Sequential (single writer) |
| Lock contention | Readers wait up to 10 seconds (busy_timeout) for a write lock |
| Cache | 64 MB in memory to reduce I/O |
In the MCP context, where tool calls are sequential, this model is well-suited.
Technical Gotchas
vec0 Does Not Support CASCADE
Virtual vec0 tables do not participate in ON DELETE CASCADE. When deleting an entity, observations and relations are removed by CASCADE, but the embedding is not. Always delete embeddings manually before the entity row (see the code example in the entity_embeddings section above).
”Cannot Start a Transaction” Error
sqlite-vec can intermittently fail with "cannot start a transaction" during delete operations. This is a known bug in sqlite-vec related to how virtual tables interact with SQLite’s transaction system. Retrying the operation usually resolves it. The codebase catches this error and logs it as a warning without failing.
Embeddings Are Non-Incremental
Each time an entity’s observations change, the embedding is completely regenerated from scratch. The input text includes a full snapshot of all current observations. This means:
- Consistency: the embedding always reflects the current state, with no partial-update artifacts
- Cost: every update triggers a full ONNX encoding (~5 ms on CPU for a single vector)
- Overwrite:
INSERT OR REPLACEin vec0 ensures old versions don’t accumulate
Lazy Embedding Engine
The MCP server starts in ~1 second because it does not load the embedding model at startup. The lazy architecture has two layers:
- Import lazy:
mcp_memory.embeddingsis not imported at module scope inserver.py. The import happens inside_get_engine(). - Instance lazy:
EmbeddingEngine.get_instance()creates the singleton only on the first call.
Consequences:
- First
search_semanticcall: ~3–5 extra seconds while the model loads - Subsequent calls: millisecond responses (engine already in memory)
- Server startup: always fast, regardless of whether the model is downloaded