Skip to content

documents

One row per document posted to a docket — proposed rules, final rules, notices, supporting analyses, and public-submission stubs. Joins to dockets on docket_id.

  • Parquet file: documents.parquet
  • Queryable via MCP query_sql: Yes
  • Primary / dedup key: document_id
Column Type Description
document_id 🔑 VARCHAR Unique document identifier. Primary key / dedup key.
docket_id VARCHAR Parent docket this document belongs to. Foreign key to dockets.docket_id.
agency_code VARCHAR Posting agency's short code.
title VARCHAR Document title.
document_type VARCHAR Document category (e.g. Rule, Proposed Rule, Notice, Supporting & Related Material).
posted_date VARCHAR Date the document was posted publicly (ISO 8601 string).
modify_date VARCHAR Timestamp the document was last modified (ISO 8601 string).
comment_start_date VARCHAR Start of the public comment period this document opens, if any. Often null.
comment_end_date VARCHAR End (deadline) of the public comment period, if any. Often null.
file_url VARCHAR URL of the document's primary downloadable rendition. Retained for backward compatibility; see attachments_json for the full list. Often null.
attachments_json VARCHAR JSON array of every downloadable rendition: [{url, format, size}]. Null when the document has no files.
fr_doc_num VARCHAR Federal Register document number, when the document was published in the FR. Often null.
withdrawn VARCHAR Whether the document was withdrawn, as the string "true"/"false". Often null.
reason_withdrawn VARCHAR Agency-supplied reason for withdrawal, when withdrawn. Often null.
additional_rins VARCHAR JSON array of additional Regulation Identifier Numbers beyond the docket's primary RIN. Often null.
text_content VARCHAR Plain text extracted from the document's PDF attachment(s) by the PDF text-extraction step. Null until that step has run; see text_extraction_status.
text_extraction_status VARCHAR Outcome of PDF text extraction: ok, empty (no extractable text, e.g. scanned/image-only PDFs — OCR is out of scope), encrypted (password-protected), or error. Null before extraction has been attempted.