This guide explains how our system automatically extracts rich metadata from Congress.gov API to enable sophisticated voting pattern analysis.
Our system integrates with the official Congress.gov API to extract:
Congress.gov API → Enhanced Daily Update Script → Database → Analysis Engine
Every day at 6 AM, our cron job runs:
Automated daily discovery
0 6 * cd /home/jmknapp/congressional-coalitions && \
venv/bin/python scripts/enhanced_daily_update.py --congress 119 --max-bills 100
For each bill discovered:
bills.policy_area
and bill_subjects
last_updated
timestamp for change detectionThe enhanced_daily_update.py
script now includes:
def fetch_bill_subjects(self, congressgov_id: str) -> List[str]:
"""Fetch bill subjects from Congress.gov API."""
subjects = []
subjects_url = f"https://api.congress.gov/v3/bill/{congressgov_id}/subjects"
headers = {'X-API-Key': self.congressgov_api_key}
try:
response = self.session.get(subjects_url, headers=headers, timeout=30)
if response.status_code == 200:
data = response.json()
if 'subjects' in data and 'legislativeSubjects' in data['subjects']:
for subject in data['subjects']['legislativeSubjects']:
if 'name' in subject:
subjects.append(subject['name'])
except Exception as e:
logger.debug(f"Error fetching subjects: {e}")
return subjects
Official CRS Classifications:
Detailed CRS Terms:
For existing bills without metadata, use the backfill script:
Backfill all bills for Congress 119
python scripts/backfill_bill_metadata.py --congress 119 --api-key YOUR_API_KEYTest with limited bills first
python scripts/backfill_bill_metadata.py --congress 119 --limit 10 --api-key YOUR_API_KEY
CREATE TABLE bills (
bill_id VARCHAR(50) PRIMARY KEY,
congress INT NOT NULL,
chamber VARCHAR(10) NOT NULL,
number INT NOT NULL,
type VARCHAR(10) NOT NULL,
title VARCHAR(1000),
introduced_date DATE,
sponsor_bioguide VARCHAR(20),
policy_area VARCHAR(200), -- NEW: CRS policy classification
summary_short VARCHAR(2000),
last_updated DATETIME, -- NEW: Change tracking
created_at DATETIME DEFAULT NOW(),
updated_at DATETIME DEFAULT NOW()
);
CREATE TABLE bill_subjects (
id INT PRIMARY KEY AUTO_INCREMENT,
bill_id VARCHAR(50) FOREIGN KEY REFERENCES bills(bill_id),
subject_term VARCHAR(200) NOT NULL, -- CRS legislative subject
created_at DATETIME DEFAULT NOW()
);
Policy Area Validation:
Subject Term Validation:
API Failures:
Data Quality:
-- Bills by policy area
SELECT policy_area, COUNT(*) as bill_count
FROM bills
WHERE policy_area IS NOT NULL
GROUP BY policy_area
ORDER BY bill_count DESC;
-- Bills with specific subjects
SELECT b.bill_id, b.title, b.policy_area
FROM bills b
JOIN bill_subjects bs ON b.bill_id = bs.bill_id
WHERE bs.subject_term LIKE '%Budget%'
ORDER BY b.introduced_date DESC;
-- Policy area + subject combinations
SELECT b.policy_area, bs.subject_term, COUNT(*) as count
FROM bills b
JOIN bill_subjects bs ON b.bill_id = bs.bill_id
GROUP BY b.policy_area, bs.subject_term
HAVING count > 5
ORDER BY count DESC;
Daily Checks:
Weekly Reviews:
Metadata Enhancement:
Performance Optimization:
---
This metadata extraction system provides the foundation for advanced congressional voting pattern analysis and collusion detection. The automated daily updates ensure data freshness while the backfill capability maintains historical completeness.