This document outlines practical analyses that connect voting behavior to bill subjects using the existing schema (bills
, bill_subjects
, rollcalls
, votes
, members
). It balances quick wins with deeper methods you can iterate toward.
bills.policy_area
(broad) and bill_subjects.subject_term
(granular). Support both.Yea=1
, Nay=0
; exclude Present
and Not Voting
from rate calculations.question
contains any of:
votes v
JOIN rollcalls r ON v.rollcall_id = r.rollcall_id
JOIN bills b ON r.bill_id = b.bill_id
LEFT JOIN bill_subjects s ON s.bill_id = b.bill_id
JOIN members m ON v.member_id_bioguide = m.member_id_bioguide
WHERE b.congress = 119 AND b.chamber = 'house'
AND v.vote_code IN ('Yea','Nay')
Party split by subject_term
(portable AVG form):
SELECT s.subject_term,
m.party,
SUM(CASE WHEN v.vote_code='Yea' THEN 1 ELSE 0 END) /
NULLIF(SUM(CASE WHEN v.vote_code IN ('Yea','Nay') THEN 1 ELSE 0 END),0) AS yea_rate,
COUNT(*) AS votes
FROM votes v
JOIN rollcalls r ON v.rollcall_id = r.rollcall_id
JOIN bills b ON r.bill_id = b.bill_id
JOIN bill_subjects s ON s.bill_id = b.bill_id
JOIN members m ON v.member_id_bioguide = m.member_id_bioguide
WHERE b.congress = 119 AND b.chamber='house'
GROUP BY s.subject_term, m.party
HAVING votes >= 20
ORDER BY votes DESC;
Member outliers by subject (residual = member_rate − party_mean):
-- Party mean per subject
WITH party_stats AS (
SELECT s.subject_term, m.party,
SUM(v.vote_code='Yea')/SUM(v.vote_code IN ('Yea','Nay')) AS party_yea
FROM votes v
JOIN rollcalls r ON v.rollcall_id=r.rollcall_id
JOIN bills b ON r.bill_id=b.bill_id
JOIN bill_subjects s ON s.bill_id=b.bill_id
JOIN members m ON v.member_id_bioguide=m.member_id_bioguide
WHERE b.congress=119 AND b.chamber='house' AND v.vote_code IN ('Yea','Nay')
GROUP BY s.subject_term, m.party
), member_stats AS (
SELECT s.subject_term, v.member_id_bioguide,
SUM(v.vote_code='Yea')/SUM(v.vote_code IN ('Yea','Nay')) AS member_yea,
COUNT(*) AS n
FROM votes v
JOIN rollcalls r ON v.rollcall_id=r.rollcall_id
JOIN bills b ON r.bill_id=b.bill_id
JOIN bill_subjects s ON s.bill_id=b.bill_id
WHERE b.congress=119 AND b.chamber='house' AND v.vote_code IN ('Yea','Nay')
GROUP BY s.subject_term, v.member_id_bioguide
)
SELECT ms.subject_term, ms.member_id_bioguide, m.party,
ms.member_yea - ps.party_yea AS residual,
ms.n
FROM member_stats ms
JOIN members m ON m.member_id_bioguide=ms.member_id_bioguide
JOIN party_stats ps ON ps.subject_term=ms.subject_term AND ps.party=m.party
WHERE ms.n >= 8
ORDER BY ABS(residual) DESC
LIMIT 50;
Caucus cohesion by subject:
SELECT s.subject_term, c.short_name AS caucus,
SUM(CASE WHEN v.vote_code = majority_vote.vote_code THEN 1 ELSE 0 END)/COUNT(*) AS cohesion,
COUNT(*) AS votes
FROM votes v
JOIN rollcalls r ON v.rollcall_id=r.rollcall_id
JOIN bills b ON r.bill_id=b.bill_id
JOIN bill_subjects s ON s.bill_id=b.bill_id
JOIN caucus_memberships cm ON cm.member_id_bioguide = v.member_id_bioguide AND cm.end_date IS NULL
JOIN caucuses c ON c.id = cm.caucus_id
JOIN (
SELECT r2.rollcall_id,
CASE WHEN SUM(v2.vote_code='Yea') >= SUM(v2.vote_code='Nay') THEN 'Yea' ELSE 'Nay' END AS vote_code
FROM votes v2 JOIN rollcalls r2 ON v2.rollcall_id=r2.rollcall_id
WHERE v2.vote_code IN ('Yea','Nay')
GROUP BY r2.rollcall_id
) majority_vote ON majority_vote.rollcall_id = v.rollcall_id
WHERE b.congress=119 AND b.chamber='house' AND v.vote_code IN ('Yea','Nay')
GROUP BY s.subject_term, c.short_name
HAVING votes >= 20;
GET /api/subjects/summary?scope=policy_area|subject_term&min_votes=20&exclude_procedural=true|false
[{subject, d_yea, r_yea, gap, votes, p_value, cramers_v}]
GET /api/subjects//detail?scope=...
member_subject_stats
(member_id, subject, n, yea_rate) to speed UI.y: 1 if Yea, 0 if Nay
X: party (binary), sponsor_party, subject dummies, caucus flags, maybe member fixed effects
from sklearn.linear_model import LogisticRegression
import pandas as pdX, y = make_features(votes_joined_df) # one row per vote
model = LogisticRegression(max_iter=200)
model.fit(X, y)
coefs = pd.Series(model.coef_[0], index=X.columns)
Interpret subject_* coefficients as subject effects controlling for party/caucus
--- This plan should fit easily into the current Flask app: add the two endpoints, a summary page, and a detail page, then cache the heavy aggregations.