SQLite data model

The persistent output of camara-senadores-mex is senado.db, a local SQLite database designed to preserve extraction evidence without filling historical gaps. The documented contract separates voting events, roll-call votes, and available senatorial profiles.

Connection settings

The persistence layer opens SQLite with defensive settings:

PRAGMA journal_mode=WAL;
PRAGMA busy_timeout=5000;
PRAGMA foreign_keys=ON;

In Python, get_connection enables WAL, busy_timeout=5000, foreign_keys=ON, and row_factory = sqlite3.Row for crawler/helpers. The validator opens the database in read-only mode with sqlite3.connect(uri, uri=True) and also sets row_factory; it should not be described as re-enabling write/concurrency PRAGMAs on that read-only connection.

Logical schema

CREATE TABLE votaciones (
  id INTEGER PRIMARY KEY,
  legislature INTEGER NOT NULL,
  year INTEGER NOT NULL,
  period INTEGER NOT NULL,
  date TEXT NOT NULL,
  url TEXT NOT NULL,
  scraped_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE senadores (
  id INTEGER PRIMARY KEY,
  nombre TEXT NOT NULL,
  sexo TEXT,
  tipo_eleccion TEXT,
  estado TEXT,
  url TEXT NOT NULL,
  scraped_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE votos_nominales (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  votacion_id INTEGER NOT NULL,
  senador_id INTEGER NOT NULL,
  nombre TEXT NOT NULL,
  partido TEXT NOT NULL,
  voto TEXT NOT NULL,
  FOREIGN KEY (votacion_id) REFERENCES votaciones(id),
  UNIQUE(votacion_id, senador_id)
);

CREATE INDEX idx_votos_votacion_id ON votos_nominales(votacion_id);
CREATE INDEX idx_votos_senador_id ON votos_nominales(senador_id);

The operational schema is read through these guarantees:

  • votaciones.id and senadores.id are institutional primary keys.
  • votos_nominales.id is autoincremental, and UNIQUE(votacion_id, senador_id) avoids duplicates within a vote.
  • votos_nominales.votacion_id references votaciones(id); senador_id is preserved as an institutional identifier, but there is no declared FK to senadores(id).
  • url is required in votaciones and senadores, but the schema does not declare uniqueness on that column.

Persistence

Writes are idempotent:

  • INSERT OR REPLACE updates voting events and profiles when the source returns the same ID again.
  • INSERT OR IGNORE avoids duplicating roll-call votes already persisted by the UNIQUE(votacion_id, senador_id) constraint.

This design supports partial retries without deleting prior evidence. Empty fields are not artificially normalized: they are preserved because they are also signals of source or parsing quality.

Sen./Dip. contract

The README and validator document that the expected dataset is senatorial. Even so, this page does not claim an explicit parser filter by the first Sen./Dip. column: the contract should be read as documented and audited scope, not as an additional guarantee not verified here.