text_metrics¶
Use when: You need per-section tone ratios, specificity scores, boolean flags, or MD&A density packs — the raw signals that feed component scores.
Start here¶
compute_text_metrics()— tone ratios, specificity, boilerplate, readability for one sectiondetect_section_flags()— boolean risk flags scoped by section namecompute_density_metrics()— MD&A keyword densities (item_7_mdna,item_2_mdna)SectionTextInput/TextMetricResult— input and output types
In the pipeline, compute_section_metrics() calls these for every extracted section. Ratio fields map to components via Aggregation (e.g. negative_word_ratio → risk_factor_intensity_score).
Example¶
from disclosure_alpha.text_metrics import SectionTextInput, compute_text_metrics, detect_section_flags
inp = SectionTextInput("item_1a_risk_factors", cleaned_text)
metrics = compute_text_metrics(inp)
flags = detect_section_flags(cleaned_text, "item_1a_risk_factors")
print(metrics.negative_word_ratio, flags["investigation_flag"])
Full API¶
- class disclosure_alpha.text_metrics.SectionTextInput(section_name: str, cleaned_text: str)[source]¶
Bases:
object- section_name : str¶
- cleaned_text : str¶
- class disclosure_alpha.text_metrics.TextMetricResult(word_count: int, sentence_count: int, average_sentence_length: float, readability_score: float | None, negative_word_ratio: float, uncertainty_word_ratio: float, litigious_word_ratio: float, constraining_word_ratio: float, modal_word_ratio: float, weak_modal_word_ratio: float, moderate_modal_word_ratio: float, strong_modal_word_ratio: float, legal_regulatory_phrase_ratio: float, numeric_specificity_score: float, company_specificity_score: float, boilerplate_phrase_ratio: float)[source]¶
Bases:
object- word_count : int¶
- sentence_count : int¶
- average_sentence_length : float¶
- readability_score : float | None¶
- negative_word_ratio : float¶
- uncertainty_word_ratio : float¶
- litigious_word_ratio : float¶
- constraining_word_ratio : float¶
- modal_word_ratio : float¶
- weak_modal_word_ratio : float¶
- moderate_modal_word_ratio : float¶
- strong_modal_word_ratio : float¶
- legal_regulatory_phrase_ratio : float¶
- numeric_specificity_score : float¶
- company_specificity_score : float¶
- boilerplate_phrase_ratio : float¶
- disclosure_alpha.text_metrics.compute_text_metrics(inp: SectionTextInput) TextMetricResult[source]¶
- disclosure_alpha.text_metrics.compute_metric_families(inp: SectionTextInput) list[dict[str, float | str]][source]¶
Return metric family rows (tone, specificity, boilerplate, liquidity, internal_controls) with raw and normalized values.
- disclosure_alpha.text_metrics.detect_section_flags(text: str, section_name: str) dict[str, bool][source]¶
Return all v1 boolean flags for a section (False when out of scope).
- disclosure_alpha.text_metrics.compute_density_metrics(text: str, section_name: str) dict[str, float][source]¶
MD&A keyword density: hits per 1000 words, capped 0–100.