Dictionaries¶
Built-in word and phrase lists in disclosure_alpha.dictionaries. Version: built_in_dictionaries_v3.
Exported constants¶
Constant |
Type |
Purpose |
|---|---|---|
|
|
Audit trail per pack (source, match_type, consumers, license) |
|
|
Token list → |
|
|
Token list → |
|
|
Token list → |
|
|
Token list → |
|
|
Modal tier (weak) |
|
|
Modal tier (moderate) |
|
|
Modal tier (strong) |
|
|
Union of modal tiers → |
|
|
Phrase list → |
|
|
Phrase list → |
|
|
Specificity proxy |
|
|
Specificity proxy |
|
|
Diff engine topic clusters |
|
|
Topic intensity modifier (±10 token window) |
|
|
Boolean section flags |
|
|
Per-flag section allowlist |
|
|
Per-flag sentence-level suppression phrases |
|
|
MD&A phrase density packs |
|
|
Section regex maps by form type |
|
|
Minimum sections per form |
TERM_PACK_METADATA shape¶
Each pack entry includes:
{
"source": "built_in_finance_curated", # or sec_pcaob_fasb_phrase_curated for flags
"match_type": "token" | "phrase",
"consumer": ["metric_or_score_name", ...],
"license": "repo_safe_manual_curation",
}
Metadata is documentation-only; runtime matching uses the constant lists above.
Matching helpers¶
Phrase and token matching logic lives in disclosure_alpha.text_matching (shared by text_metrics and diff_engine).