text_metrics¶

Use when: You need per-section tone ratios, specificity scores, boolean flags, or MD&A density packs — the raw signals that feed component scores.

Start here¶

compute_text_metrics() — tone ratios, specificity, boilerplate, readability for one section
detect_section_flags() — boolean risk flags scoped by section name
compute_density_metrics() — MD&A keyword densities (item_7_mdna, item_2_mdna)
SectionTextInput / TextMetricResult — input and output types

In the pipeline, compute_section_metrics() calls these for every extracted section. Ratio fields map to components via Aggregation (e.g. negative_word_ratio → risk_factor_intensity_score).

Example¶

from disclosure_alpha.text_metrics import SectionTextInput, compute_text_metrics, detect_section_flags

inp = SectionTextInput("item_1a_risk_factors", cleaned_text)
metrics = compute_text_metrics(inp)
flags = detect_section_flags(cleaned_text, "item_1a_risk_factors")
print(metrics.negative_word_ratio, flags["investigation_flag"])

Full API¶

class disclosure_alpha.text_metrics.SectionTextInput(section_name: str, cleaned_text: str)[source]¶

Bases: object

section_name : str¶

cleaned_text : str¶

Bases: object

word_count : int¶

sentence_count : int¶

average_sentence_length : float¶

readability_score : float | None¶

negative_word_ratio : float¶

uncertainty_word_ratio : float¶

litigious_word_ratio : float¶

constraining_word_ratio : float¶

modal_word_ratio : float¶

weak_modal_word_ratio : float¶

moderate_modal_word_ratio : float¶

strong_modal_word_ratio : float¶

legal_regulatory_phrase_ratio : float¶

numeric_specificity_score : float¶

company_specificity_score : float¶

boilerplate_phrase_ratio : float¶

disclosure_alpha.text_metrics.tokenize_words(text: str) → list[str][source]¶

disclosure_alpha.text_metrics.compute_text_metrics(inp: SectionTextInput) → TextMetricResult[source]¶

disclosure_alpha.text_metrics.compute_metric_families(inp: SectionTextInput) → list[dict[str, float | str]][source]¶: Return metric family rows (tone, specificity, boilerplate, liquidity, internal_controls) with raw and normalized values.

disclosure_alpha.text_metrics.detect_section_flags(text: str, section_name: str) → dict[str, bool][source]¶: Return all v1 boolean flags for a section (False when out of scope).

disclosure_alpha.text_metrics.compute_density_metrics(text: str, section_name: str) → dict[str, float][source]¶: MD&A keyword density: hits per 1000 words, capped 0–100.