Overview
The Observatorio del Congreso pipeline runs six analysis modules that transform raw roll-call voting data into political structure and power insights. The processing chain goes from individual vote records to ideological positions, co-voting networks, community partitions, centrality rankings, and formal vs. empirical power indices.
| Module | Input | Output | Core Algorithm |
|---|---|---|---|
| W-NOMINATE | Vote matrix (legislators x events) | Ideal point coordinates | SVD + Newton-Raphson |
| Co-Voting | Vote records | NxN similarity matrix + graph | Agreement counting |
| Communities | Co-voting graph | Partition dict (node -> community) | Louvain (nx.community) |
| Centrality | Co-voting graph | Per-node scores | Weighted degree, betweenness |
| Power Indices | Seat counts per party | Shapley-Shubik, Banzhaf values | Dynamic programming O(n²W) |
| Empirical Power | Vote records + seat counts | Critical party frequencies | Roll-call analysis |
Configuration Layer
The analysis pipeline centralizes tuneable parameters in analysis/config.py. This allows adjusting thresholds and algorithm settings without modifying individual modules.
| Parameter | Type | Default | Purpose |
|---|---|---|---|
MIN_EVENTS_PER_WINDOW | int | 30 | Minimum vote events per time window (windows below this are merged) |
LOUVAIN_SEED | int | 42 | Random seed for Louvain reproducibility |
LOUVAIN_RESOLUTION | float | 1.0 | Louvain resolution (> 1.0 = smaller communities) |
CLOSE_VOTES_THRESHOLD | int | 10 | Maximum margin for “close vote” classification |
REFORMA_JUDICIAL_VE_IDS | list[str] | [“VE04”, “VE05”] | Vote event IDs for Judicial Reform analysis |
TOP_DISSENTERS_GLOBAL | int | 10 | Number of top dissenters to return (global) |
TOP_DISSIDENTS_PER_WINDOW | int | 5 | Number of top dissenters per time window |
DEALIGNMENT_THRESHOLD | float | -0.05 | Minimum co-voting change to detect party dealignment |
Data Access Layer
analysis/db.py centralizes SQLite access for all analysis modules. It provides a connection factory with PRAGMAs configured and five parametrized query functions that eliminate SQL duplication across modules.
Connection Factory
get_connection(db_path=None) returns a SQLite connection with:
journal_mode = WAL(concurrent reads)busy_timeout = 5000ms(retry on lock)foreign_keys = ON(enforce referential integrity)row_factory = sqlite3.Row(dict-like column access)
Query Functions
| Function | Parameters | Returns |
|---|---|---|
get_vote_events() | legislatura, organization_id, result | Filtered vote_event rows |
get_votes() | vote_event_id, voter_id, join_vote_event | Vote rows with optional legislatura/camera |
get_persons() | person_id | Person rows |
get_organizations() | clasificacion | Organization rows |
get_memberships() | person_id, org_id, rol | Membership rows |
All functions use parameterized queries (no string interpolation) and proper connection lifecycle management (try/finally close).
Shared Constants
analysis/constants.py centralizes party colors, name mappings, and canonical ordering used across all visualization and analysis modules.
| Constant | Type | Purpose |
|---|---|---|
PARTY_COLORS | dict[str, str] | Matplotlib colors per party (key = uppercase short name) |
DEFAULT_COLOR | str | Fallback color (#CCCCCC) |
ORG_TO_SHORT | dict[str, str] | org_id → short name mapping (O01 → MORENA) |
PARTY_ORDER | list[str] | Canonical party ordering for visualizations |
COMMON_PARTIES | list[str] | Parties present in both chambers |
CAMARA_MAP | dict[str, str] | Chamber name → code (diputados → D) |
COLORES_WEB | dict[str, str] | ECharts-compatible colors for web export |
PARTIDO_MAP | dict[str, str] | Full name → abbreviation for JSON export |
W-NOMINATE: Ideal Point Estimation
W-NOMINATE (Weighted Nominal Three-Step Estimation) is the standard algorithm for estimating legislator ideological positions from roll call votes, developed by Poole & Rosenthal (1985, 1997).
References:
- Poole & Rosenthal (1985). “A Spatial Model for Legislative Roll Call Analysis”. American Journal of Political Science, 29(2), 357-384.
- Poole & Rosenthal (1997). Congress: A Political-Economic History of Roll Call Voting. Oxford University Press.
- Poole (2005). Spatial Models of Parliamentary Voting. Cambridge University Press.
How It Works
The algorithm takes a binary vote matrix and recovers legislator ideal points in a low-dimensional policy space.
Step 1: Binarization. Raw vote strings are mapped to binary values:
| Vote type | Binary value |
|---|---|
a_favor | 1 (Yea) |
en_contra | 0 (Nay) |
abstencion | NaN (excluded) |
ausente | NaN (excluded) |
Step 2: Filtering. Low-information votes and inactive legislators are removed:
min_votes = 10 # minimum binary votes per legislator
min_participants = 10 # minimum binary participants per vote event
lopsided_threshold = 0.975 # filter near-unanimous votes
Step 3: Estimation. The algorithm estimates two parameters per legislator (coordinates in 2D space) and two per vote event:
- Ideal points (x_i, y_i): each legislator’s position in the policy space
- Salience weights (beta): how sharply a legislator’s utility drops with distance from the cutting plane
- Normal vectors (w_j): define the cutting plane separating Yea from Nay for each vote
Initialization uses SVD decomposition of the binary matrix, then Newton-Raphson optimization maximizes the classification likelihood.
Quality Metrics
| Metric | Description | Interpretation |
|---|---|---|
| Classification rate | % of votes correctly predicted by the model | Higher = better fit; typical range 85-95% |
| APRE | Aggregate Proportional Reduction in Error | Improvement over baseline (majority prediction); 0.0 = no improvement, 1.0 = perfect |
Implementation Variants
nominate_by_legislatura: runs W-NOMINATE separately for each legislature, producing independent ideal point spaces per periodnominate_cross_legislatura: combines all legislatures into a single run, placing all legislators in a shared space for direct comparison
Dependencies
scipy (svd, minimize, norm), numpy, pandas
Co-Voting Analysis
Co-voting measures how often each pair of legislators vote the same way. This is the foundation for all network-based analysis.
Construction Pipeline
- Load data: votes, persons, and organizations from SQLite
- Party normalization:
normalize_party()maps mixedvote.groupvalues to canonical organization IDs - Primary party assignment:
get_primary_party()assigns each legislator to their most frequent party - Build matrix:
build_covotacion_matrix()produces an NxN numpy matrix where entry (i,j) = agreement count between legislators i and j, normalized to 0-1 - Build graph:
build_graph()converts the matrix to a NetworkX graph with:- Nodes: legislators, with attributes for party, gender
- Edges: co-voting pairs, with
weight= normalized similarity
# Simplified co-voting weight calculation
for i, j in legislator_pairs:
shared_votes = votes_i.intersection(votes_j)
total_votes = votes_i.union(votes_j)
weight = len(shared_votes) / len(total_votes)
Output
The module returns a dict containing:
| Key | Type | Description |
|---|---|---|
matrix | NxN numpy array | Pairwise co-voting similarity |
graph | networkx.Graph | Weighted co-voting network |
party_map | dict | person_id -> party mapping |
org_map | dict | org_id -> party name mapping |
persons_df | DataFrame | Legislator metadata |
Community Detection (Louvain)
The Louvain algorithm detects communities of legislators who vote similarly, going beyond formal party labels to reveal actual voting blocs.
Algorithm
Louvain performs two-phase iterative optimization:
- Local moving: each node moves to the neighbor community that yields the largest modularity gain
- Aggregation: communities are collapsed into super-nodes, and the process repeats
The resolution parameter controls community granularity:
| Resolution | Effect |
|---|---|
| < 1.0 | Fewer, larger communities (coarser) |
| 1.0 (default) | Standard modularity |
| > 1.0 | More, smaller communities (finer) |
Implementation
The module uses NetworkX’s built-in Louvain implementation (available since NX 3.2), configured via the shared config.py parameters:
communities = nx.community.louvain_communities(
graph,
weight="weight",
resolution=config.LOUVAIN_RESOLUTION,
seed=config.LOUVAIN_SEED, # 42 for reproducibility
)
:::note
The seed parameter ensures reproducible community partitions across runs. The resolution value can be tuned in config.py without modifying the detection module.
:::
Output
detect_communities() returns a partition dict mapping each node_id to a community_id.
analyze_communities() produces detailed analysis per community:
- Party composition: count and percentage of each party within the community
- Purity metric: percentage of the dominant party (100% = pure party bloc)
- Cross-party legislators: individuals whose community differs from their formal party
- Sub-blocks: detection of internal factions within large parties (specifically MORENA sub-bloques)
:::tip A community with purity below 70% signals a genuine cross-party coalition, not just a party label. These mixed communities often reveal real legislative alliances. :::
Dependencies
networkx (built-in nx.community.louvain_communities since NX 3.2)
Centrality Metrics
Centrality identifies structurally important legislators in the co-voting network. Two complementary metrics are used.
Weighted Degree Centrality
centrality[node] = weighted_degree(node) / max_weighted_degree
Each node’s weighted degree is the sum of its edge weights (total co-voting intensity with all other legislators), normalized by the maximum weighted degree in the graph. Values range from 0.0 to 1.0.
Interpretation: high weighted degree = legislator co-votes heavily with many others, indicating alignment with the dominant coalition.
Betweenness Centrality
betweenness = nx.betweenness_centrality(graph, weight=None)
Betweenness is computed unweighted (weight=None). This is a deliberate choice:
:::tip
Co-voting weights are similarity measures, not geodesic distances. Higher weight = more similar = closer. If passed as weight to NetworkX, the algorithm would interpret them as costs, treating strongly co-voting pairs as far apart. This inverts the intended interpretation. Using weight=None counts each edge as one hop, correctly identifying legislators who bridge distinct voting blocs.
:::
Interpretation: high betweenness = legislator sits on the shortest paths between different communities, acting as a potential broker or swing voter.
| Metric | Weight handling | Captures |
|---|---|---|
| Weighted Degree | Uses weights | Overall co-voting intensity |
| Betweenness | Ignores weights (weight=None) | Structural bridging position |
Power Indices (Nominal)
Nominal power indices calculate how much bargaining power each party has based solely on seat counts, assuming all members vote with their party.
Shapley-Shubik Index
For a party p, the Shapley-Shubik index computes marginal power using dynamic programming instead of brute-force permutation enumeration:
def shapley_shubik(player_weights, quota):
n = len(player_weights)
results = {}
for i in range(n):
# dp[s][w] = number of subsets of size s
# with total weight w (excluding player i)
dp = build_dp_table(weights_without_i, quota)
ss_i = 0.0
for s in range(n):
for w in range(quota):
if w + player_weights[i] >= quota:
ss_i += dp[s][w] * factorial(s) * factorial(n - 1 - s)
results[i] = ss_i / factorial(n)
return results
A party is critical (a pivot) when it joins a coalition that is below the quota, and its weight pushes the total to or above the quota. The DP table counts how many coalition configurations make each party critical.
Complexity: O(n²W) where n = number of parties and W = quota. For 13 parties with quota ~251, this is approximately 330K operations per player — compared to 6.2 billion (13!) with brute-force permutation enumeration.
Banzhaf Index
For a party p, count all winning coalitions where p is critical (its defection flips the outcome):
for coalition in all_subsets(parties):
if coalition is winning AND coalition - {party} is losing:
party is critical in this coalition
banzhaf_index[party] = critical_count[party] / total_critical_count
Seat Assignment
Multi-membership (legislators belonging to more than one party across their career) is resolved by:
- Collect all party memberships for each legislator
- Assign to the party where the legislator cast the most votes
- Ties broken by most recent
start_datemembership
Per-Chamber Analysis
Both indices support separate analysis for Diputados (camara='D') and Senado (camara='S').
Empirical Power Analysis
Nominal power assumes party discipline. Empirical power measures what actually happens in roll-call votes.
Critical Parties per Vote
For each vote event, the module identifies which parties were necessary to reach the majority threshold:
winning_coalition = parties that voted with the majority
for party in winning_coalition:
seats_without = majority_seats - party_seats
if seats_without < majority_threshold:
party is "critical" for this vote
Empirical Power Index
empirical_power[party] = times_critical[party] / total_vote_events
This produces a 0.0-1.0 score reflecting how often a party’s votes were actually decisive.
Swing Voters and Close Votes
The module identifies:
- Close votes: vote events where the margin was narrow (near the majority threshold)
- Swing voters: individual legislators whose vote could have changed the outcome
- Top dissenters: legislators who voted against their party line most frequently, ranked by dissent count
Power Comparison
The key output is a four-way comparison:
| Index | Basis | What It Measures |
|---|---|---|
| Nominal (seats) | Seat count | Formal representation |
| Shapley-Shubik | Seat distribution | Bargaining power (DP-based) |
| Banzhaf | Seat distribution | Bargaining power (coalition-based) |
| Empirical | Actual votes | Real-world relevance |
Divergences between nominal and empirical power reveal parties that are formally small but strategically critical (or vice versa).
:::tip A party with 5% of seats but an empirical power index of 20% is a kingmaker: its votes are disproportionately decisive. This pattern appears when a dominant coalition frequently needs a small party to cross the majority threshold. :::
Runner Infrastructure
Seven analysis runners (run_*.py) provide CLI access to individual analyses. They share common infrastructure from runner_utils.py:
| Runner | Analysis Module | What it runs |
|---|---|---|
run_analysis.py | All | Complete analysis pipeline |
run_nominate.py | nominate.py | W-NOMINATE ideal point estimation |
run_covotacion_dinamica.py | covotacion_dinamica.py | Time-windowed co-voting |
run_evolucion_partidos.py | evolucion_partidos.py | Party evolution across legislatures |
run_efecto_genero.py | efecto_genero.py | Gender effect on voting behavior |
run_efecto_curul_tipo.py | efecto_curul_tipo.py | Seat type effect on voting |
run_trayectorias.py | trayectorias.py | Individual legislator trajectories |
All runners support --camara (diputados/senado) and --output-dir flags via runner_utils.build_simple_parser(). The run_for_cameras() helper allows running an analysis for one or both chambers.
Logging
All runners use runner_utils.setup_logging() for consistent log formatting:
2026-04-15 10:30:00 - INFO - analysis.poder_empirico - Starting analysis...
Temporal Dynamics
All methods support temporal analysis across legislatures.
Per-Legislature Analysis
Each legislature gets its own independent analysis:
- Separate W-NOMINATE ideal point spaces
- Independent co-voting graphs and community detection
- Chamber-specific power indices reflecting seat changes
Cross-Legislature Comparison
nominate_cross_legislaturaplaces all legislators in a shared ideal point space, enabling direct comparison across time periods- Community evolution tracking: which legislators shift communities between legislatures, and what that implies about coalition realignment
Modularity Trends
Tracking the Louvain modularity score across legislatures reveals changes in voting bloc cohesion:
- Rising modularity: parties are voting more cohesively, sharper partisan divisions
- Declining modularity: cross-party voting increasing, blocs dissolving or realigning
- Sudden drops: may indicate a major legislative event (reform vote, leadership change) that disrupted normal voting patterns