Skip to content

Disclosure Alpha

Dictionaries

alwank/disclosure-alpha

Dictionaries¶

Built-in word and phrase lists in disclosure_alpha.dictionaries. Version: built_in_dictionaries_v3.

Exported constants¶

Constant	Type	Purpose
`TERM_PACK_METADATA`	`dict[str, dict]`	Audit trail per pack (source, match_type, consumers, license)
`NEGATIVE_WORDS`	`frozenset[str]`	Token list → `negative_word_ratio`
`UNCERTAINTY_WORDS`	`frozenset[str]`	Token list → `uncertainty_word_ratio`
`LITIGIOUS_WORDS`	`frozenset[str]`	Token list → `litigious_word_ratio`
`CONSTRAINING_WORDS`	`frozenset[str]`	Token list → `constraining_word_ratio`
`WEAK_MODAL_WORDS`	`frozenset[str]`	Modal tier (weak)
`MODERATE_MODAL_WORDS`	`frozenset[str]`	Modal tier (moderate)
`STRONG_MODAL_WORDS`	`frozenset[str]`	Modal tier (strong)
`MODAL_WORDS`	`frozenset[str]`	Union of modal tiers → `modal_word_ratio`
`LEGAL_REGULATORY_PHRASES`	`list[str]`	Phrase list → `legal_regulatory_phrase_ratio`
`BOILERPLATE_PHRASES`	`list[str]`	Phrase list → `boilerplate_phrase_ratio`
`GEOGRAPHY_TERMS`	`frozenset[str]`	Specificity proxy
`SEGMENT_TERMS`	`frozenset[str]`	Specificity proxy
`TOPIC_KEYWORDS`	`dict[str, list[str]]`	Diff engine topic clusters
`SEVERITY_WORDS`	`frozenset[str]`	Topic intensity modifier (±10 token window)
`FLAG_PATTERNS`	`dict[str, list[str]]`	Boolean section flags
`FLAG_SECTION_SCOPE`	`dict[str, frozenset[str]]`	Per-flag section allowlist
`FLAG_SUPPRESSIONS`	`dict[str, list[str]]`	Per-flag sentence-level suppression phrases
`MDNA_DENSITY_TERMS`	`dict[str, list[str]]`	MD&A phrase density packs
`SUPPORTED_SECTIONS_*`	`dict[str, str]`	Section regex maps by form type
`REQUIRED_SECTIONS`	`dict[str, list[str]]`	Minimum sections per form

`TERM_PACK_METADATA` shape¶

Each pack entry includes:

{
    "source": "built_in_finance_curated",  # or sec_pcaob_fasb_phrase_curated for flags
    "match_type": "token" | "phrase",
    "consumer": ["metric_or_score_name", ...],
    "license": "repo_safe_manual_curation",
}

Metadata is documentation-only; runtime matching uses the constant lists above.

Matching helpers¶

Phrase and token matching logic lives in disclosure_alpha.text_matching (shared by text_metrics and diff_engine).