Skip to content

Architecture

The Observatorio del Congreso is a quantitative analysis platform for Mexico’s legislative branch (Cámara de Diputados + Senado de la República). It uses a unified Popolo-Graph schema stored in SQLite to model legislators, parties, votes, and informal power networks across seven legislatures (LX through LXVI, 2006-2026). The dataset covers approximately 3.5 million individual votes, 8,000 vote events, and 3,800 legislators.

┌─────────────────────────────────────────────────────────────────────┐
│ DATA COLLECTION │
│ │
│ ┌──────────────────────┐ ┌──────────────────────────────────┐ │
│ │ Senado Scraper │ │ Diputados Scraper │ │
│ │ curl_cffi + TLS │ │ httpx + BeautifulSoup │ │
│ │ fingerprint │ │ │ │
│ │ (Anti-WAF: │ │ SITL / INFOPAL open portal │ │
│ │ Incapsula bypass) │ │ + datos.abiertos API │ │
│ └──────────┬───────────┘ └──────────────┬───────────────────┘ │
│ │ │ │
└─────────────┼──────────────────────────────────┼────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ PARSE & LOAD │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Transformers → Loaders (deduplication via source_id) │ │
│ └──────────────────────────────┬───────────────────────────────┘ │
│ │ │
└─────────────────────────────────┼───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ STORAGE LAYER │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ SQLite (WAL mode) — congreso.db │ │
│ │ Popolo-Graph Schema: 12 tables │ │
│ │ area · organization · person · membership · post │ │
│ │ motion · vote_event · vote · count │ │
│ │ actor_externo · relacion_poder · evento_politico │ │
│ └──────────────────────────────┬───────────────────────────────┘ │
│ │ │
└─────────────────────────────────┼───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ANALYSIS LAYER │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ W-NOMINATE │ │ Co-voting │ │ Community Detection │ │
│ │ (scipy, │ │ Matrix │ │ (networkx, │ │
│ │ numpy) │ │ & Graph │ │ python-louvain) │ │
│ └──────┬───────┘ └──────┬───────┘ └───────────┬──────────────┘ │
│ │ │ │ │
│ ┌──────┴───────┐ ┌──────┴───────┐ ┌───────────┴──────────────┐ │
│ │ Centrality │ │ Power │ │ Empirical Power │ │
│ │ (degree, │ │ Indices │ │ (from real voting │ │
│ │ betweenness)│ │ (Shapley- │ │ coalitions) │ │
│ │ │ │ Shubik, │ │ │ │
│ │ │ │ Banzhaf) │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └───────────┬──────────────┘ │
│ │ │ │ │
└─────────┼──────────────────┼───────────────────────┼─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ EXPORT LAYER │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ JSON files → public/data/observatorio/ │ │
│ │ Pre-aggregated, static, no server-side computation │ │
│ └──────────────────────────────┬───────────────────────────────┘ │
│ │ │
└─────────────────────────────────┼───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ VISUALIZATION LAYER │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ CachorroSpace (Astro + Starlight) │ │
│ │ ECharts 6 via React islands │ │
│ │ Interactive charts: NOMINATE maps, co-voting graphs, │ │
│ │ power indices, community structures │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
ComponentTechnologyPurpose
Scraping (Senado)curl_cffi + TLS fingerprint impersonationAnti-WAF evasion for Incapsula-protected portal
Scraping (Diputados)httpx + BeautifulSoupOpen data portal scraping (SITL / INFOPAL)
DatabaseSQLite (WAL mode)Unified Popolo-Graph storage
Analysis — NOMINATEscipy, numpy, matplotlibIdeal point estimation (W-NOMINATE algorithm)
Analysis — Networksnetworkx, python-louvainCo-voting graphs, community detection
ExportsJSON (static)Pre-aggregated data for visualizations
VisualizationsECharts 6 (React islands)Interactive charts on CachorroSpace
SourceURLChamberData
Cámara de Diputadosdatos_abiertos / SITL / INFOPALDiputadosVoting records, legislator profiles, composition
Senado de la Repúblicasenado.gob.mx/66/SenadoVoting records, senator profiles, directorio

The pipeline processes data in four stages:

Each chamber has a dedicated scraper with its own HTTP client, parser, and transformer modules:

  • Senado: curl_cffi session with TLS impersonation retrieves voting pages. Parsers extract vote data from HTML. Transformers normalize data into Popolo-Graph format.
  • Diputados: httpx client with file-based caching and rate limiting queries the SITL/INFOPAL systems. Parsers handle both XML and HTML responses.

Data flows through loaders that insert records into congreso.db with deduplication via the source_id column on the vote_event table. The id_generator module produces human-readable IDs with prefixes (P01, O01, VE01, etc.).

Analysis scripts read from SQLite and compute:

  • W-NOMINATE: Ideal point estimation placing legislators on a 2D ideological map
  • Co-voting matrix: Pairwise agreement rates between legislators, exported as weighted graphs
  • Community detection: Louvain algorithm identifies voting blocs within co-voting networks
  • Centrality: Degree and betweenness centrality measures on co-voting graphs
  • Power indices: Shapley-Shubik and Banzhaf indices based on seat distributions
  • Empirical power: Measured from real voting coalition data, not just seat counts

The export_observatorio_json.py script reads analysis CSV outputs and produces static JSON files consumed by ECharts 6 visualizations embedded as React islands in CachorroSpace.

analysis/output/*.csv
export_observatorio_json.py
public/data/observatorio/*.json
React ECharts islands (CachorroSpace)
observatorio-congreso/
├── db/
│ ├── schema.sql # Popolo-Graph schema (12 tables)
│ ├── senado_schema.sql # Senado-specific schema extensions
│ ├── init_db.py # Database initialization + seed data
│ ├── helpers.py # SQLite helper functions
│ ├── id_generator.py # Human-readable ID generation (P01, O01...)
│ ├── constants.py # Legislature mappings and constants
│ ├── migrations/ # Schema migrations and data fixes
│ └── congreso.db # SQLite database (~3.5M votes)
├── diputados/
│ └── scraper/
│ ├── client.py # httpx HTTP client with cache
│ ├── config.py # Scraper configuration
│ ├── pipeline.py # Main scraping pipeline
│ ├── loader.py # SQLite loader (dedup via source_id)
│ ├── models.py # Data models
│ ├── legislatura.py # Legislature range logic
│ └── parsers/
│ ├── votaciones.py # Vote event parser
│ ├── nominal.py # Nominal (roll-call) vote parser
│ ├── desglose.py # Vote breakdown parser
│ ├── diputado.py # Legislator profile parser
│ └── composicion.py # Chamber composition parser
├── senado/
│ └── scrapers/
│ ├── shared/
│ │ ├── client.py # Anti-WAF client (curl_cffi)
│ │ ├── config.py # Scraper configuration
│ │ └── models.py # Shared data models
│ ├── votaciones/
│ │ ├── __main__.py # CLI entry point
│ │ ├── cli.py # Command-line interface
│ │ ├── transformers.py # Data normalization
│ │ ├── congreso_loader.py # SQLite loader
│ │ └── parsers/
│ │ └── lxvi_portal.py # LXVI portal parser
│ └── perfiles/
│ ├── __main__.py # CLI entry point
│ ├── scraper.py # Profile scraper
│ └── parsers/
│ └── perfil_parser.py # Senator profile parser
├── analysis/
│ ├── nominate.py # W-NOMINATE implementation
│ ├── covotacion.py # Co-voting matrix and graph
│ ├── covotacion_dinamica.py # Dynamic (time-windowed) co-voting
│ ├── comunidades.py # Louvain community detection
│ ├── centralidad.py # Degree and betweenness centrality
│ ├── poder_partidos.py # Shapley-Shubik and Banzhaf indices
│ ├── poder_empirico.py # Empirical power from real votes
│ ├── run_analysis.py # Run all analyses
│ ├── run_nominate.py # Run NOMINATE only
│ ├── run_covotacion_dinamica.py # Run dynamic co-voting
│ ├── visualizacion.py # General visualization exports
│ ├── visualizacion_nominate.py # NOMINATE chart data
│ ├── visualizacion_dinamica.py # Dynamic co-voting chart data
│ ├── visualizacion_poder.py # Power indices chart data
│ ├── visualizacion_articulo.py # Article-specific visualizations
│ ├── scripts/
│ │ └── export_observatorio_json.py # CSV → JSON for ECharts
│ ├── analisis-diputados/ # Diputados-specific analysis outputs
│ ├── analisis-senado/ # Senado-specific analysis outputs
│ └── analisis-bicameral/ # Cross-chamber analysis outputs
├── utils/
│ ├── db_utils.py # Database utility functions
│ ├── text_utils.py # Text normalization utilities
│ └── tests/
│ └── test_text_utils.py # Text utility tests
├── cache/ # HTTP response cache
├── logs/ # Scraper logs
├── pyproject.toml # Project dependencies (uv)
└── scrape_diputados_all.sh # Batch scraper script

SQLite is configured for safe concurrent access and data integrity:

PRAGMA foreign_keys = ON;
PRAGMA encoding = "UTF-8";
PRAGMA journal_mode = WAL;
PRAGMA busy_timeout = 5000;
SettingValuePurpose
journal_modeWALConcurrent reads without blocking writes
foreign_keysONEnforce referential integrity between tables
busy_timeout5000msWait up to 5 seconds if database is locked
encodingUTF-8Correct handling of Spanish characters (accents, ñ)

The Popolo-Graph schema contains 12 tables, organized into four groups:

Core Popolo entities (legislative data standard):

TablePurpose
areaGeographic divisions (states, districts, constituencies)
organizationPolitical parties, blocs, coalitions, institutions
personLegislators and political actors
membershipPerson-to-organization relationships with roles and dates
postLegislative positions within organizations and areas
motionBills and legislative initiatives
vote_eventSpecific voting instances (chamber + date)
voteIndividual legislator votes per event
countAggregated vote counts per group per event

Power network extensions (beyond standard Popolo):

TablePurpose
actor_externoExternal actors (governors, party leaders, judges)
relacion_poderInformal power relationships (loyalty, pressure, alliances)
evento_politicoPolitical events that affect power dynamics

The schema includes indexes on the most common query patterns:

  • membership queries by person and by organization
  • vote_event lookups by motion and by source_id (deduplication)
  • vote queries by voter and by event
  • count queries by event and by group
  • relacion_poder queries by source, target, and type
  • person filtering by internal faction (corriente_interna)

Date validation triggers ensure end_date >= start_date on person and membership tables for both inserts and updates. These fire at the SQLite level to prevent data corruption regardless of which loader writes the data.