`comments`

One row per public comment submitted to a docket — the largest table (tens of millions of rows). On R2 it is also published as Hive-partitioned files under comments/agency_code=.../docket_id=.../year=.../month=.... Joins to dockets on docket_id.

Parquet file: comments.parquet
Queryable via MCP query_sql: Yes
Primary / dedup key: comment_id

Column	Type	Description
`comment_id` 🔑	`VARCHAR`	Unique comment identifier. Primary key / dedup key.
`docket_id`	`VARCHAR`	Docket the comment was submitted to. Foreign key to `dockets.docket_id`.
`agency_code`	`VARCHAR`	Receiving agency's short code.
`first_name`	`VARCHAR`	Commenter's first name, when provided. Often null.
`last_name`	`VARCHAR`	Commenter's last name, when provided. Often null.
`organization`	`VARCHAR`	Organization the commenter represents, when provided. Often null.
`category`	`VARCHAR`	Submitter category as classified on regulations.gov. Often null.
`title`	`VARCHAR`	Comment title / subject line.
`comment`	`VARCHAR`	Full free-text body of the comment. The largest field in the dataset.
`document_type`	`VARCHAR`	Document category for the comment record, typically `Public Submission`.
`posted_date`	`VARCHAR`	Date the comment was posted publicly (ISO 8601 string).
`modify_date`	`VARCHAR`	Timestamp the comment was last modified (ISO 8601 string).
`receive_date`	`VARCHAR`	Date the agency received the comment (ISO 8601 string).
`attachments_json`	`VARCHAR`	JSON array of any files attached to the comment: `[{title, formats:[{url, format, size}]}]`. Null when there are no attachments.
`text_content`	`VARCHAR`	Plain text of the comment's attachment(s). Filled inline during the ETL from Mirrulations' pre-extracted text (the bucket's `derived-data` prefix); attachments not yet extracted upstream are backfilled by the on-demand PDF text-extraction step. Null when neither has produced text; see `text_extraction_status`.
`text_extraction_status`	`VARCHAR`	Outcome of attachment text extraction: `ok` (text was filled, from derived-data or PDF extraction), `empty` (no extractable text, e.g. scanned/image-only PDFs — OCR is out of scope), `encrypted` (password-protected), or `error`. Null when no text has been filled yet.