Analysis Methods — CachorroSpace

Overview

The Observatorio del Congreso pipeline runs six analysis modules that transform raw roll-call voting data into political structure and power insights. The processing chain goes from individual vote records to ideological positions, co-voting networks, community partitions, centrality rankings, and formal vs. empirical power indices.

Module	Input	Output	Core Algorithm
W-NOMINATE	Vote matrix (legislators x events)	Ideal point coordinates	SVD + Newton-Raphson
Co-Voting	Vote records	NxN similarity matrix + graph	Agreement counting
Communities	Co-voting graph	Partition dict (node -> community)	Louvain (nx.community)
Centrality	Co-voting graph	Per-node scores	Weighted degree, betweenness
Power Indices	Seat counts per party	Shapley-Shubik, Banzhaf values	Dynamic programming O(n²W)
Empirical Power	Vote records + seat counts	Critical party frequencies	Roll-call analysis

Configuration Layer

The analysis pipeline centralizes tuneable parameters in analysis/config.py. This allows adjusting thresholds and algorithm settings without modifying individual modules.

Parameter	Type	Default	Purpose
`MIN_EVENTS_PER_WINDOW`	int	30	Minimum vote events per time window (windows below this are merged)
`LOUVAIN_SEED`	int	42	Random seed for Louvain reproducibility
`LOUVAIN_RESOLUTION`	float	1.0	Louvain resolution (> 1.0 = smaller communities)
`CLOSE_VOTES_THRESHOLD`	int	10	Maximum margin for “close vote” classification
`REFORMA_JUDICIAL_VE_IDS`	list[str]	[“VE04”, “VE05”]	Vote event IDs for Judicial Reform analysis
`TOP_DISSENTERS_GLOBAL`	int	10	Number of top dissenters to return (global)
`TOP_DISSIDENTS_PER_WINDOW`	int	5	Number of top dissenters per time window
`DEALIGNMENT_THRESHOLD`	float	-0.05	Minimum co-voting change to detect party dealignment

Data Access Layer

analysis/db.py centralizes SQLite access for all analysis modules. It provides a connection factory with PRAGMAs configured and five parametrized query functions that eliminate SQL duplication across modules.

Connection Factory

get_connection(db_path=None) returns a SQLite connection with:

journal_mode = WAL (concurrent reads)
busy_timeout = 5000ms (retry on lock)
foreign_keys = ON (enforce referential integrity)
row_factory = sqlite3.Row (dict-like column access)

Query Functions

Function	Parameters	Returns
`get_vote_events()`	legislatura, organization_id, result	Filtered vote_event rows
`get_votes()`	vote_event_id, voter_id, join_vote_event	Vote rows with optional legislatura/camera
`get_persons()`	person_id	Person rows
`get_organizations()`	clasificacion	Organization rows
`get_memberships()`	person_id, org_id, rol	Membership rows

All functions use parameterized queries (no string interpolation) and proper connection lifecycle management (try/finally close).

Shared Constants

analysis/constants.py centralizes party colors, name mappings, and canonical ordering used across all visualization and analysis modules.

Constant	Type	Purpose
`PARTY_COLORS`	dict[str, str]	Matplotlib colors per party (key = uppercase short name)
`DEFAULT_COLOR`	str	Fallback color (#CCCCCC)
`ORG_TO_SHORT`	dict[str, str]	org_id → short name mapping (O01 → MORENA)
`PARTY_ORDER`	list[str]	Canonical party ordering for visualizations
`COMMON_PARTIES`	list[str]	Parties present in both chambers
`CAMARA_MAP`	dict[str, str]	Chamber name → code (diputados → D)
`COLORES_WEB`	dict[str, str]	ECharts-compatible colors for web export
`PARTIDO_MAP`	dict[str, str]	Full name → abbreviation for JSON export

W-NOMINATE: Ideal Point Estimation

W-NOMINATE (Weighted Nominal Three-Step Estimation) is the standard algorithm for estimating legislator ideological positions from roll call votes, developed by Poole & Rosenthal (1985, 1997).

References:

Poole & Rosenthal (1985). “A Spatial Model for Legislative Roll Call Analysis”. American Journal of Political Science, 29(2), 357-384.
Poole & Rosenthal (1997). Congress: A Political-Economic History of Roll Call Voting. Oxford University Press.
Poole (2005). Spatial Models of Parliamentary Voting. Cambridge University Press.

How It Works

The algorithm takes a binary vote matrix and recovers legislator ideal points in a low-dimensional policy space.

Step 1: Binarization. Raw vote strings are mapped to binary values:

Vote type	Binary value
`a_favor`	1 (Yea)
`en_contra`	0 (Nay)
`abstencion`	NaN (excluded)
`ausente`	NaN (excluded)

Step 2: Filtering. Low-information votes and inactive legislators are removed:

min_votes = 10           # minimum binary votes per legislator
min_participants = 10    # minimum binary participants per vote event
lopsided_threshold = 0.975  # filter near-unanimous votes

Step 3: Estimation. The algorithm estimates two parameters per legislator (coordinates in 2D space) and two per vote event:

Ideal points (x_i, y_i): each legislator’s position in the policy space
Salience weights (beta): how sharply a legislator’s utility drops with distance from the cutting plane
Normal vectors (w_j): define the cutting plane separating Yea from Nay for each vote

Initialization uses SVD decomposition of the binary matrix, then Newton-Raphson optimization maximizes the classification likelihood.

Quality Metrics

Metric	Description	Interpretation
Classification rate	% of votes correctly predicted by the model	Higher = better fit; typical range 85-95%
APRE	Aggregate Proportional Reduction in Error	Improvement over baseline (majority prediction); 0.0 = no improvement, 1.0 = perfect

Implementation Variants

nominate_by_legislatura: runs W-NOMINATE separately for each legislature, producing independent ideal point spaces per period
nominate_cross_legislatura: combines all legislatures into a single run, placing all legislators in a shared space for direct comparison

Dependencies

scipy (svd, minimize, norm), numpy, pandas

Co-Voting Analysis

Co-voting measures how often each pair of legislators vote the same way. This is the foundation for all network-based analysis.

Construction Pipeline

Load data: votes, persons, and organizations from SQLite
Party normalization: normalize_party() maps mixed vote.group values to canonical organization IDs
Primary party assignment: get_primary_party() assigns each legislator to their most frequent party
Build matrix: build_covotacion_matrix() produces an NxN numpy matrix where entry (i,j) = agreement count between legislators i and j, normalized to 0-1
Build graph: build_graph() converts the matrix to a NetworkX graph with:
- Nodes: legislators, with attributes for party, gender
- Edges: co-voting pairs, with weight = normalized similarity

# Simplified co-voting weight calculation
for i, j in legislator_pairs:
    shared_votes = votes_i.intersection(votes_j)
    total_votes = votes_i.union(votes_j)
    weight = len(shared_votes) / len(total_votes)

Output

The module returns a dict containing:

Key	Type	Description
`matrix`	NxN numpy array	Pairwise co-voting similarity
`graph`	networkx.Graph	Weighted co-voting network
`party_map`	dict	person_id -> party mapping
`org_map`	dict	org_id -> party name mapping
`persons_df`	DataFrame	Legislator metadata

Community Detection (Louvain)

The Louvain algorithm detects communities of legislators who vote similarly, going beyond formal party labels to reveal actual voting blocs.

Algorithm

Louvain performs two-phase iterative optimization:

Local moving: each node moves to the neighbor community that yields the largest modularity gain
Aggregation: communities are collapsed into super-nodes, and the process repeats

The resolution parameter controls community granularity:

Resolution	Effect
< 1.0	Fewer, larger communities (coarser)
1.0 (default)	Standard modularity
> 1.0	More, smaller communities (finer)

Implementation

The module uses NetworkX’s built-in Louvain implementation (available since NX 3.2), configured via the shared config.py parameters:

communities = nx.community.louvain_communities(
    graph,
    weight="weight",
    resolution=config.LOUVAIN_RESOLUTION,
    seed=config.LOUVAIN_SEED,  # 42 for reproducibility
)

:::note The seed parameter ensures reproducible community partitions across runs. The resolution value can be tuned in config.py without modifying the detection module. :::

Output

detect_communities() returns a partition dict mapping each node_id to a community_id.

analyze_communities() produces detailed analysis per community:

Party composition: count and percentage of each party within the community
Purity metric: percentage of the dominant party (100% = pure party bloc)
Cross-party legislators: individuals whose community differs from their formal party
Sub-blocks: detection of internal factions within large parties (specifically MORENA sub-bloques)

:::tip A community with purity below 70% signals a genuine cross-party coalition, not just a party label. These mixed communities often reveal real legislative alliances. :::

Dependencies

networkx (built-in nx.community.louvain_communities since NX 3.2)

Centrality Metrics

Centrality identifies structurally important legislators in the co-voting network. Two complementary metrics are used.

Weighted Degree Centrality

centrality[node] = weighted_degree(node) / max_weighted_degree

Each node’s weighted degree is the sum of its edge weights (total co-voting intensity with all other legislators), normalized by the maximum weighted degree in the graph. Values range from 0.0 to 1.0.

Interpretation: high weighted degree = legislator co-votes heavily with many others, indicating alignment with the dominant coalition.

Betweenness Centrality

betweenness = nx.betweenness_centrality(graph, weight=None)

Betweenness is computed unweighted (weight=None). This is a deliberate choice:

:::tip Co-voting weights are similarity measures, not geodesic distances. Higher weight = more similar = closer. If passed as weight to NetworkX, the algorithm would interpret them as costs, treating strongly co-voting pairs as far apart. This inverts the intended interpretation. Using weight=None counts each edge as one hop, correctly identifying legislators who bridge distinct voting blocs. :::

Interpretation: high betweenness = legislator sits on the shortest paths between different communities, acting as a potential broker or swing voter.

Metric	Weight handling	Captures
Weighted Degree	Uses weights	Overall co-voting intensity
Betweenness	Ignores weights (weight=None)	Structural bridging position

Power Indices (Nominal)

Nominal power indices calculate how much bargaining power each party has based solely on seat counts, assuming all members vote with their party.

Shapley-Shubik Index

For a party p, the Shapley-Shubik index computes marginal power using dynamic programming instead of brute-force permutation enumeration:

def shapley_shubik(player_weights, quota):
    n = len(player_weights)
    results = {}
    for i in range(n):
        # dp[s][w] = number of subsets of size s
        #            with total weight w (excluding player i)
        dp = build_dp_table(weights_without_i, quota)
        ss_i = 0.0
        for s in range(n):
            for w in range(quota):
                if w + player_weights[i] >= quota:
                    ss_i += dp[s][w] * factorial(s) * factorial(n - 1 - s)
        results[i] = ss_i / factorial(n)
    return results

A party is critical (a pivot) when it joins a coalition that is below the quota, and its weight pushes the total to or above the quota. The DP table counts how many coalition configurations make each party critical.

Complexity: O(n²W) where n = number of parties and W = quota. For 13 parties with quota ~251, this is approximately 330K operations per player — compared to 6.2 billion (13!) with brute-force permutation enumeration.

Banzhaf Index

For a party p, count all winning coalitions where p is critical (its defection flips the outcome):

for coalition in all_subsets(parties):
    if coalition is winning AND coalition - {party} is losing:
        party is critical in this coalition
banzhaf_index[party] = critical_count[party] / total_critical_count

Seat Assignment

Multi-membership (legislators belonging to more than one party across their career) is resolved by:

Collect all party memberships for each legislator
Assign to the party where the legislator cast the most votes
Ties broken by most recent start_date membership

Per-Chamber Analysis

Both indices support separate analysis for Diputados (camara='D') and Senado (camara='S').

Empirical Power Analysis

Nominal power assumes party discipline. Empirical power measures what actually happens in roll-call votes.

Critical Parties per Vote

For each vote event, the module identifies which parties were necessary to reach the majority threshold:

winning_coalition = parties that voted with the majority
for party in winning_coalition:
    seats_without = majority_seats - party_seats
    if seats_without < majority_threshold:
        party is "critical" for this vote

Empirical Power Index

empirical_power[party] = times_critical[party] / total_vote_events

This produces a 0.0-1.0 score reflecting how often a party’s votes were actually decisive.

Swing Voters and Close Votes

The module identifies:

Close votes: vote events where the margin was narrow (near the majority threshold)
Swing voters: individual legislators whose vote could have changed the outcome
Top dissenters: legislators who voted against their party line most frequently, ranked by dissent count

Power Comparison

The key output is a four-way comparison:

Index	Basis	What It Measures
Nominal (seats)	Seat count	Formal representation
Shapley-Shubik	Seat distribution	Bargaining power (DP-based)
Banzhaf	Seat distribution	Bargaining power (coalition-based)
Empirical	Actual votes	Real-world relevance

Divergences between nominal and empirical power reveal parties that are formally small but strategically critical (or vice versa).

:::tip A party with 5% of seats but an empirical power index of 20% is a kingmaker: its votes are disproportionately decisive. This pattern appears when a dominant coalition frequently needs a small party to cross the majority threshold. :::

Runner Infrastructure

Seven analysis runners (run_*.py) provide CLI access to individual analyses. They share common infrastructure from runner_utils.py:

Runner	Analysis Module	What it runs
`run_analysis.py`	All	Complete analysis pipeline
`run_nominate.py`	`nominate.py`	W-NOMINATE ideal point estimation
`run_covotacion_dinamica.py`	`covotacion_dinamica.py`	Time-windowed co-voting
`run_evolucion_partidos.py`	`evolucion_partidos.py`	Party evolution across legislatures
`run_efecto_genero.py`	`efecto_genero.py`	Gender effect on voting behavior
`run_efecto_curul_tipo.py`	`efecto_curul_tipo.py`	Seat type effect on voting
`run_trayectorias.py`	`trayectorias.py`	Individual legislator trajectories

All runners support --camara (diputados/senado) and --output-dir flags via runner_utils.build_simple_parser(). The run_for_cameras() helper allows running an analysis for one or both chambers.

Logging

All runners use runner_utils.setup_logging() for consistent log formatting:

2026-04-15 10:30:00 - INFO - analysis.poder_empirico - Starting analysis...

Temporal Dynamics

All methods support temporal analysis across legislatures.

Per-Legislature Analysis

Each legislature gets its own independent analysis:

Separate W-NOMINATE ideal point spaces
Independent co-voting graphs and community detection
Chamber-specific power indices reflecting seat changes

Cross-Legislature Comparison

nominate_cross_legislatura places all legislators in a shared ideal point space, enabling direct comparison across time periods
Community evolution tracking: which legislators shift communities between legislatures, and what that implies about coalition realignment

Modularity Trends

Tracking the Louvain modularity score across legislatures reveals changes in voting bloc cohesion:

Rising modularity: parties are voting more cohesively, sharper partisan divisions
Declining modularity: cross-party voting increasing, blocs dissolving or realigning
Sudden drops: may indicate a major legislative event (reform vote, leadership change) that disrupted normal voting patterns