documents
One row per document posted to a docket — proposed rules, final rules, notices, supporting analyses, and public-submission stubs. Joins to dockets on docket_id.
- Parquet file:
documents.parquet - Queryable via MCP
query_sql: Yes - Primary / dedup key:
document_id
| Column | Type | Description |
|---|---|---|
document_id 🔑 |
VARCHAR |
Unique document identifier. Primary key / dedup key. |
docket_id |
VARCHAR |
Parent docket this document belongs to. Foreign key to dockets.docket_id. |
agency_code |
VARCHAR |
Posting agency's short code. |
title |
VARCHAR |
Document title. |
document_type |
VARCHAR |
Document category (e.g. Rule, Proposed Rule, Notice, Supporting & Related Material). |
posted_date |
VARCHAR |
Date the document was posted publicly (ISO 8601 string). |
modify_date |
VARCHAR |
Timestamp the document was last modified (ISO 8601 string). |
comment_start_date |
VARCHAR |
Start of the public comment period this document opens, if any. Often null. |
comment_end_date |
VARCHAR |
End (deadline) of the public comment period, if any. Often null. |
file_url |
VARCHAR |
URL of the document's primary downloadable rendition. Retained for backward compatibility; see attachments_json for the full list. Often null. |
attachments_json |
VARCHAR |
JSON array of every downloadable rendition: [{url, format, size}]. Null when the document has no files. |
fr_doc_num |
VARCHAR |
Federal Register document number, when the document was published in the FR. Often null. |
withdrawn |
VARCHAR |
Whether the document was withdrawn, as the string "true"/"false". Often null. |
reason_withdrawn |
VARCHAR |
Agency-supplied reason for withdrawal, when withdrawn. Often null. |
additional_rins |
VARCHAR |
JSON array of additional Regulation Identifier Numbers beyond the docket's primary RIN. Often null. |
text_content |
VARCHAR |
Plain text extracted from the document's PDF attachment(s) by the PDF text-extraction step. Null until that step has run; see text_extraction_status. |
text_extraction_status |
VARCHAR |
Outcome of PDF text extraction: ok, empty (no extractable text, e.g. scanned/image-only PDFs — OCR is out of scope), encrypted (password-protected), or error. Null before extraction has been attempted. |