Overview

The Observatorio del Congreso pipeline runs six analysis modules that transform raw roll-call voting data into political structure and power insights. The processing chain goes from individual vote records to ideological positions, co-voting networks, community partitions, centrality rankings, and formal vs. empirical power indices.

ModuleInputOutputCore Algorithm
W-NOMINATEVote matrix (legislators x events)Ideal point coordinatesSVD + Newton-Raphson
Co-VotingVote recordsNxN similarity matrix + graphAgreement counting
CommunitiesCo-voting graphPartition dict (node -> community)Louvain (nx.community)
CentralityCo-voting graphPer-node scoresWeighted degree, betweenness
Power IndicesSeat counts per partyShapley-Shubik, Banzhaf valuesDynamic programming O(n²W)
Empirical PowerVote records + seat countsCritical party frequenciesRoll-call analysis

Configuration Layer

The analysis pipeline centralizes tuneable parameters in analysis/config.py. This allows adjusting thresholds and algorithm settings without modifying individual modules.

ParameterTypeDefaultPurpose
MIN_EVENTS_PER_WINDOWint30Minimum vote events per time window (windows below this are merged)
LOUVAIN_SEEDint42Random seed for Louvain reproducibility
LOUVAIN_RESOLUTIONfloat1.0Louvain resolution (> 1.0 = smaller communities)
CLOSE_VOTES_THRESHOLDint10Maximum margin for “close vote” classification
REFORMA_JUDICIAL_VE_IDSlist[str][“VE04”, “VE05”]Vote event IDs for Judicial Reform analysis
TOP_DISSENTERS_GLOBALint10Number of top dissenters to return (global)
TOP_DISSIDENTS_PER_WINDOWint5Number of top dissenters per time window
DEALIGNMENT_THRESHOLDfloat-0.05Minimum co-voting change to detect party dealignment

Data Access Layer

analysis/db.py centralizes SQLite access for all analysis modules. It provides a connection factory with PRAGMAs configured and five parametrized query functions that eliminate SQL duplication across modules.

Connection Factory

get_connection(db_path=None) returns a SQLite connection with:

  • journal_mode = WAL (concurrent reads)
  • busy_timeout = 5000ms (retry on lock)
  • foreign_keys = ON (enforce referential integrity)
  • row_factory = sqlite3.Row (dict-like column access)

Query Functions

FunctionParametersReturns
get_vote_events()legislatura, organization_id, resultFiltered vote_event rows
get_votes()vote_event_id, voter_id, join_vote_eventVote rows with optional legislatura/camera
get_persons()person_idPerson rows
get_organizations()clasificacionOrganization rows
get_memberships()person_id, org_id, rolMembership rows

All functions use parameterized queries (no string interpolation) and proper connection lifecycle management (try/finally close).

Shared Constants

analysis/constants.py centralizes party colors, name mappings, and canonical ordering used across all visualization and analysis modules.

ConstantTypePurpose
PARTY_COLORSdict[str, str]Matplotlib colors per party (key = uppercase short name)
DEFAULT_COLORstrFallback color (#CCCCCC)
ORG_TO_SHORTdict[str, str]org_id → short name mapping (O01 → MORENA)
PARTY_ORDERlist[str]Canonical party ordering for visualizations
COMMON_PARTIESlist[str]Parties present in both chambers
CAMARA_MAPdict[str, str]Chamber name → code (diputados → D)
COLORES_WEBdict[str, str]ECharts-compatible colors for web export
PARTIDO_MAPdict[str, str]Full name → abbreviation for JSON export

W-NOMINATE: Ideal Point Estimation

W-NOMINATE (Weighted Nominal Three-Step Estimation) is the standard algorithm for estimating legislator ideological positions from roll call votes, developed by Poole & Rosenthal (1985, 1997).

References:

  • Poole & Rosenthal (1985). “A Spatial Model for Legislative Roll Call Analysis”. American Journal of Political Science, 29(2), 357-384.
  • Poole & Rosenthal (1997). Congress: A Political-Economic History of Roll Call Voting. Oxford University Press.
  • Poole (2005). Spatial Models of Parliamentary Voting. Cambridge University Press.

How It Works

The algorithm takes a binary vote matrix and recovers legislator ideal points in a low-dimensional policy space.

Step 1: Binarization. Raw vote strings are mapped to binary values:

Vote typeBinary value
a_favor1 (Yea)
en_contra0 (Nay)
abstencionNaN (excluded)
ausenteNaN (excluded)

Step 2: Filtering. Low-information votes and inactive legislators are removed:

min_votes = 10           # minimum binary votes per legislator
min_participants = 10    # minimum binary participants per vote event
lopsided_threshold = 0.975  # filter near-unanimous votes

Step 3: Estimation. The algorithm estimates two parameters per legislator (coordinates in 2D space) and two per vote event:

  • Ideal points (x_i, y_i): each legislator’s position in the policy space
  • Salience weights (beta): how sharply a legislator’s utility drops with distance from the cutting plane
  • Normal vectors (w_j): define the cutting plane separating Yea from Nay for each vote

Initialization uses SVD decomposition of the binary matrix, then Newton-Raphson optimization maximizes the classification likelihood.

Quality Metrics

MetricDescriptionInterpretation
Classification rate% of votes correctly predicted by the modelHigher = better fit; typical range 85-95%
APREAggregate Proportional Reduction in ErrorImprovement over baseline (majority prediction); 0.0 = no improvement, 1.0 = perfect

Implementation Variants

  • nominate_by_legislatura: runs W-NOMINATE separately for each legislature, producing independent ideal point spaces per period
  • nominate_cross_legislatura: combines all legislatures into a single run, placing all legislators in a shared space for direct comparison

Dependencies

scipy (svd, minimize, norm), numpy, pandas

Co-Voting Analysis

Co-voting measures how often each pair of legislators vote the same way. This is the foundation for all network-based analysis.

Construction Pipeline

  1. Load data: votes, persons, and organizations from SQLite
  2. Party normalization: normalize_party() maps mixed vote.group values to canonical organization IDs
  3. Primary party assignment: get_primary_party() assigns each legislator to their most frequent party
  4. Build matrix: build_covotacion_matrix() produces an NxN numpy matrix where entry (i,j) = agreement count between legislators i and j, normalized to 0-1
  5. Build graph: build_graph() converts the matrix to a NetworkX graph with:
    • Nodes: legislators, with attributes for party, gender
    • Edges: co-voting pairs, with weight = normalized similarity
# Simplified co-voting weight calculation
for i, j in legislator_pairs:
    shared_votes = votes_i.intersection(votes_j)
    total_votes = votes_i.union(votes_j)
    weight = len(shared_votes) / len(total_votes)

Output

The module returns a dict containing:

KeyTypeDescription
matrixNxN numpy arrayPairwise co-voting similarity
graphnetworkx.GraphWeighted co-voting network
party_mapdictperson_id -> party mapping
org_mapdictorg_id -> party name mapping
persons_dfDataFrameLegislator metadata

Community Detection (Louvain)

The Louvain algorithm detects communities of legislators who vote similarly, going beyond formal party labels to reveal actual voting blocs.

Algorithm

Louvain performs two-phase iterative optimization:

  1. Local moving: each node moves to the neighbor community that yields the largest modularity gain
  2. Aggregation: communities are collapsed into super-nodes, and the process repeats

The resolution parameter controls community granularity:

ResolutionEffect
< 1.0Fewer, larger communities (coarser)
1.0 (default)Standard modularity
> 1.0More, smaller communities (finer)

Implementation

The module uses NetworkX’s built-in Louvain implementation (available since NX 3.2), configured via the shared config.py parameters:

communities = nx.community.louvain_communities(
    graph,
    weight="weight",
    resolution=config.LOUVAIN_RESOLUTION,
    seed=config.LOUVAIN_SEED,  # 42 for reproducibility
)

:::note The seed parameter ensures reproducible community partitions across runs. The resolution value can be tuned in config.py without modifying the detection module. :::

Output

detect_communities() returns a partition dict mapping each node_id to a community_id.

analyze_communities() produces detailed analysis per community:

  • Party composition: count and percentage of each party within the community
  • Purity metric: percentage of the dominant party (100% = pure party bloc)
  • Cross-party legislators: individuals whose community differs from their formal party
  • Sub-blocks: detection of internal factions within large parties (specifically MORENA sub-bloques)

:::tip A community with purity below 70% signals a genuine cross-party coalition, not just a party label. These mixed communities often reveal real legislative alliances. :::

Dependencies

networkx (built-in nx.community.louvain_communities since NX 3.2)

Centrality Metrics

Centrality identifies structurally important legislators in the co-voting network. Two complementary metrics are used.

Weighted Degree Centrality

centrality[node] = weighted_degree(node) / max_weighted_degree

Each node’s weighted degree is the sum of its edge weights (total co-voting intensity with all other legislators), normalized by the maximum weighted degree in the graph. Values range from 0.0 to 1.0.

Interpretation: high weighted degree = legislator co-votes heavily with many others, indicating alignment with the dominant coalition.

Betweenness Centrality

betweenness = nx.betweenness_centrality(graph, weight=None)

Betweenness is computed unweighted (weight=None). This is a deliberate choice:

:::tip Co-voting weights are similarity measures, not geodesic distances. Higher weight = more similar = closer. If passed as weight to NetworkX, the algorithm would interpret them as costs, treating strongly co-voting pairs as far apart. This inverts the intended interpretation. Using weight=None counts each edge as one hop, correctly identifying legislators who bridge distinct voting blocs. :::

Interpretation: high betweenness = legislator sits on the shortest paths between different communities, acting as a potential broker or swing voter.

MetricWeight handlingCaptures
Weighted DegreeUses weightsOverall co-voting intensity
BetweennessIgnores weights (weight=None)Structural bridging position

Power Indices (Nominal)

Nominal power indices calculate how much bargaining power each party has based solely on seat counts, assuming all members vote with their party.

Shapley-Shubik Index

For a party p, the Shapley-Shubik index computes marginal power using dynamic programming instead of brute-force permutation enumeration:

def shapley_shubik(player_weights, quota):
    n = len(player_weights)
    results = {}
    for i in range(n):
        # dp[s][w] = number of subsets of size s
        #            with total weight w (excluding player i)
        dp = build_dp_table(weights_without_i, quota)
        ss_i = 0.0
        for s in range(n):
            for w in range(quota):
                if w + player_weights[i] >= quota:
                    ss_i += dp[s][w] * factorial(s) * factorial(n - 1 - s)
        results[i] = ss_i / factorial(n)
    return results

A party is critical (a pivot) when it joins a coalition that is below the quota, and its weight pushes the total to or above the quota. The DP table counts how many coalition configurations make each party critical.

Complexity: O(n²W) where n = number of parties and W = quota. For 13 parties with quota ~251, this is approximately 330K operations per player — compared to 6.2 billion (13!) with brute-force permutation enumeration.

Banzhaf Index

For a party p, count all winning coalitions where p is critical (its defection flips the outcome):

for coalition in all_subsets(parties):
    if coalition is winning AND coalition - {party} is losing:
        party is critical in this coalition
banzhaf_index[party] = critical_count[party] / total_critical_count

Seat Assignment

Multi-membership (legislators belonging to more than one party across their career) is resolved by:

  1. Collect all party memberships for each legislator
  2. Assign to the party where the legislator cast the most votes
  3. Ties broken by most recent start_date membership

Per-Chamber Analysis

Both indices support separate analysis for Diputados (camara='D') and Senado (camara='S').

Empirical Power Analysis

Nominal power assumes party discipline. Empirical power measures what actually happens in roll-call votes.

Critical Parties per Vote

For each vote event, the module identifies which parties were necessary to reach the majority threshold:

winning_coalition = parties that voted with the majority
for party in winning_coalition:
    seats_without = majority_seats - party_seats
    if seats_without < majority_threshold:
        party is "critical" for this vote

Empirical Power Index

empirical_power[party] = times_critical[party] / total_vote_events

This produces a 0.0-1.0 score reflecting how often a party’s votes were actually decisive.

Swing Voters and Close Votes

The module identifies:

  • Close votes: vote events where the margin was narrow (near the majority threshold)
  • Swing voters: individual legislators whose vote could have changed the outcome
  • Top dissenters: legislators who voted against their party line most frequently, ranked by dissent count

Power Comparison

The key output is a four-way comparison:

IndexBasisWhat It Measures
Nominal (seats)Seat countFormal representation
Shapley-ShubikSeat distributionBargaining power (DP-based)
BanzhafSeat distributionBargaining power (coalition-based)
EmpiricalActual votesReal-world relevance

Divergences between nominal and empirical power reveal parties that are formally small but strategically critical (or vice versa).

:::tip A party with 5% of seats but an empirical power index of 20% is a kingmaker: its votes are disproportionately decisive. This pattern appears when a dominant coalition frequently needs a small party to cross the majority threshold. :::

Runner Infrastructure

Seven analysis runners (run_*.py) provide CLI access to individual analyses. They share common infrastructure from runner_utils.py:

RunnerAnalysis ModuleWhat it runs
run_analysis.pyAllComplete analysis pipeline
run_nominate.pynominate.pyW-NOMINATE ideal point estimation
run_covotacion_dinamica.pycovotacion_dinamica.pyTime-windowed co-voting
run_evolucion_partidos.pyevolucion_partidos.pyParty evolution across legislatures
run_efecto_genero.pyefecto_genero.pyGender effect on voting behavior
run_efecto_curul_tipo.pyefecto_curul_tipo.pySeat type effect on voting
run_trayectorias.pytrayectorias.pyIndividual legislator trajectories

All runners support --camara (diputados/senado) and --output-dir flags via runner_utils.build_simple_parser(). The run_for_cameras() helper allows running an analysis for one or both chambers.

Logging

All runners use runner_utils.setup_logging() for consistent log formatting:

2026-04-15 10:30:00 - INFO - analysis.poder_empirico - Starting analysis...

Temporal Dynamics

All methods support temporal analysis across legislatures.

Per-Legislature Analysis

Each legislature gets its own independent analysis:

  • Separate W-NOMINATE ideal point spaces
  • Independent co-voting graphs and community detection
  • Chamber-specific power indices reflecting seat changes

Cross-Legislature Comparison

  • nominate_cross_legislatura places all legislators in a shared ideal point space, enabling direct comparison across time periods
  • Community evolution tracking: which legislators shift communities between legislatures, and what that implies about coalition realignment

Tracking the Louvain modularity score across legislatures reveals changes in voting bloc cohesion:

  • Rising modularity: parties are voting more cohesively, sharper partisan divisions
  • Declining modularity: cross-party voting increasing, blocs dissolving or realigning
  • Sudden drops: may indicate a major legislative event (reform vote, leadership change) that disrupted normal voting patterns