`documents`

One row per document posted to a docket — proposed rules, final rules, notices, supporting analyses, and public-submission stubs. Joins to dockets on docket_id.

Parquet file: documents.parquet
Queryable via MCP query_sql: Yes
Primary / dedup key: document_id

Column	Type	Description
`document_id` 🔑	`VARCHAR`	Unique document identifier. Primary key / dedup key.
`docket_id`	`VARCHAR`	Parent docket this document belongs to. Foreign key to `dockets.docket_id`.
`agency_code`	`VARCHAR`	Posting agency's short code.
`title`	`VARCHAR`	Document title.
`document_type`	`VARCHAR`	Document category (e.g. `Rule`, `Proposed Rule`, `Notice`, `Supporting & Related Material`).
`posted_date`	`VARCHAR`	Date the document was posted publicly (ISO 8601 string).
`modify_date`	`VARCHAR`	Timestamp the document was last modified (ISO 8601 string).
`comment_start_date`	`VARCHAR`	Start of the public comment period this document opens, if any. Often null.
`comment_end_date`	`VARCHAR`	End (deadline) of the public comment period, if any. Often null.
`file_url`	`VARCHAR`	URL of the document's primary downloadable rendition. Retained for backward compatibility; see `attachments_json` for the full list. Often null.
`attachments_json`	`VARCHAR`	JSON array of every downloadable rendition: `[{url, format, size}]`. Null when the document has no files.
`fr_doc_num`	`VARCHAR`	Federal Register document number, when the document was published in the FR. Often null.
`withdrawn`	`VARCHAR`	Whether the document was withdrawn, as the string `"true"`/`"false"`. Often null.
`reason_withdrawn`	`VARCHAR`	Agency-supplied reason for withdrawal, when withdrawn. Often null.
`additional_rins`	`VARCHAR`	JSON array of additional Regulation Identifier Numbers beyond the docket's primary RIN. Often null.
`text_content`	`VARCHAR`	Plain text extracted from the document's PDF attachment(s) by the PDF text-extraction step. Null until that step has run; see `text_extraction_status`.
`text_extraction_status`	`VARCHAR`	Outcome of PDF text extraction: `ok`, `empty` (no extractable text, e.g. scanned/image-only PDFs — OCR is out of scope), `encrypted` (password-protected), or `error`. Null before extraction has been attempted.