comments
One row per public comment submitted to a docket — the largest table (tens of millions of rows). On R2 it is also published as Hive-partitioned files under comments/agency_code=.../docket_id=.../year=.../month=.... Joins to dockets on docket_id.
- Parquet file:
comments.parquet - Queryable via MCP
query_sql: Yes - Primary / dedup key:
comment_id
| Column | Type | Description |
|---|---|---|
comment_id 🔑 |
VARCHAR |
Unique comment identifier. Primary key / dedup key. |
docket_id |
VARCHAR |
Docket the comment was submitted to. Foreign key to dockets.docket_id. |
agency_code |
VARCHAR |
Receiving agency's short code. |
first_name |
VARCHAR |
Commenter's first name, when provided. Often null. |
last_name |
VARCHAR |
Commenter's last name, when provided. Often null. |
organization |
VARCHAR |
Organization the commenter represents, when provided. Often null. |
category |
VARCHAR |
Submitter category as classified on regulations.gov. Often null. |
title |
VARCHAR |
Comment title / subject line. |
comment |
VARCHAR |
Full free-text body of the comment. The largest field in the dataset. |
document_type |
VARCHAR |
Document category for the comment record, typically Public Submission. |
posted_date |
VARCHAR |
Date the comment was posted publicly (ISO 8601 string). |
modify_date |
VARCHAR |
Timestamp the comment was last modified (ISO 8601 string). |
receive_date |
VARCHAR |
Date the agency received the comment (ISO 8601 string). |
attachments_json |
VARCHAR |
JSON array of any files attached to the comment: [{title, formats:[{url, format, size}]}]. Null when there are no attachments. |
text_content |
VARCHAR |
Plain text of the comment's attachment(s). Filled inline during the ETL from Mirrulations' pre-extracted text (the bucket's derived-data prefix); attachments not yet extracted upstream are backfilled by the on-demand PDF text-extraction step. Null when neither has produced text; see text_extraction_status. |
text_extraction_status |
VARCHAR |
Outcome of attachment text extraction: ok (text was filled, from derived-data or PDF extraction), empty (no extractable text, e.g. scanned/image-only PDFs — OCR is out of scope), encrypted (password-protected), or error. Null when no text has been filled yet. |