Spicy Regs Data Dictionary
This is the schema reference for the Spicy Regs dataset — an open mirror of regulations.gov federal regulatory data, published as Apache Parquet on a public Cloudflare R2 bucket.
It documents every published table, column by column. The schema is generated directly from the code that defines and produces the data, and a CI check fails whenever the schema and these descriptions drift apart — so this reference stays in step with what's actually published.
Where the data comes from
regulations.gov → Mirrulations S3 mirror → Spicy Regs ETL → Parquet on R2
(r2.spicy-regs.dev)
The ETL flattens the raw regulations.gov JSON into a handful of flat tables and
publishes them, plus a few small pre-computed rollups, to
https://r2.spicy-regs.dev.
The tables
| Table | Grain | Queryable via MCP |
|---|---|---|
dockets |
one row per docket | Yes |
documents |
one row per document | Yes |
comments |
one row per public comment | Yes |
comments_index |
one row per comment partition | Yes |
feed_summary |
one row per docket (rollup) | Yes |
agency_stats |
one row per agency (rollup) | Yes |
agency_monthly_volume |
one row per agency/month/type (rollup) | Yes |
How the tables relate
The three core tables form a simple hierarchy keyed by id:
dockets (docket_id)
└── documents (document_id, docket_id →)
└── comments (comment_id, docket_id →)
documents.docket_idandcomments.docket_idreferencedockets.docket_id.agency_codeappears on every table and is the join key for the agency rollups.- The rollups (
comments_index,feed_summary,agency_stats,agency_monthly_volume) are pre-aggregated views built from the three core tables so consumers don't have to scan the tens-of-millions-of-rows comments dataset.
How to query it
=== "AI assistant (MCP)"
The hosted MCP server exposes `list_sources`, `describe_table`, and
`query_sql` over all of the tables above. Add
`https://mcp.spicy-regs.dev/mcp` as a connector, or run it locally:
```bash
claude mcp add spicy-regs -- uvx --from "spicy-regs @ git+https://github.com/civictechdc/spicy-regs" spicy-regs-mcp
```
=== "CLI"
```bash
uvx --from "spicy-regs @ git+https://github.com/civictechdc/spicy-regs" spicy-regs download
uv run spicy-regs stats
```
=== "DuckDB (SQL)"
```sql
INSTALL httpfs; LOAD httpfs;
SELECT agency_code, COUNT(*) AS dockets
FROM read_parquet('https://r2.spicy-regs.dev/dockets.parquet')
GROUP BY agency_code
ORDER BY dockets DESC
LIMIT 20;
```
Keeping this current
Column names and types are the source of truth in code
(RECORD_TYPES for the core tables, DERIVED_SCHEMAS for the rollups). The
prose lives in data_dictionary/descriptions.yaml. Run
uv run spicy-regs-dict generate to rebuild the table pages, and
uv run spicy-regs-dict check to verify the two are in sync — the same
check runs in CI on every pull request.