Skip to content

comments

One row per public comment submitted to a docket — the largest table (tens of millions of rows). On R2 it is also published as Hive-partitioned files under comments/agency_code=.../docket_id=.../year=.../month=.... Joins to dockets on docket_id.

  • Parquet file: comments.parquet
  • Queryable via MCP query_sql: Yes
  • Primary / dedup key: comment_id
Column Type Description
comment_id 🔑 VARCHAR Unique comment identifier. Primary key / dedup key.
docket_id VARCHAR Docket the comment was submitted to. Foreign key to dockets.docket_id.
agency_code VARCHAR Receiving agency's short code.
first_name VARCHAR Commenter's first name, when provided. Often null.
last_name VARCHAR Commenter's last name, when provided. Often null.
organization VARCHAR Organization the commenter represents, when provided. Often null.
category VARCHAR Submitter category as classified on regulations.gov. Often null.
title VARCHAR Comment title / subject line.
comment VARCHAR Full free-text body of the comment. The largest field in the dataset.
document_type VARCHAR Document category for the comment record, typically Public Submission.
posted_date VARCHAR Date the comment was posted publicly (ISO 8601 string).
modify_date VARCHAR Timestamp the comment was last modified (ISO 8601 string).
receive_date VARCHAR Date the agency received the comment (ISO 8601 string).
attachments_json VARCHAR JSON array of any files attached to the comment: [{title, formats:[{url, format, size}]}]. Null when there are no attachments.
text_content VARCHAR Plain text of the comment's attachment(s). Filled inline during the ETL from Mirrulations' pre-extracted text (the bucket's derived-data prefix); attachments not yet extracted upstream are backfilled by the on-demand PDF text-extraction step. Null when neither has produced text; see text_extraction_status.
text_extraction_status VARCHAR Outcome of attachment text extraction: ok (text was filled, from derived-data or PDF extraction), empty (no extractable text, e.g. scanned/image-only PDFs — OCR is out of scope), encrypted (password-protected), or error. Null when no text has been filled yet.