text_metrics

Use when: You need per-section tone ratios, specificity scores, boolean flags, or MD&A density packs — the raw signals that feed component scores.

Start here

  • compute_text_metrics() — tone ratios, specificity, boilerplate, readability for one section

  • detect_section_flags() — boolean risk flags scoped by section name

  • compute_density_metrics() — MD&A keyword densities (item_7_mdna, item_2_mdna)

  • SectionTextInput / TextMetricResult — input and output types

In the pipeline, compute_section_metrics() calls these for every extracted section. Ratio fields map to components via Aggregation (e.g. negative_word_ratiorisk_factor_intensity_score).

Example

from disclosure_alpha.text_metrics import SectionTextInput, compute_text_metrics, detect_section_flags

inp = SectionTextInput("item_1a_risk_factors", cleaned_text)
metrics = compute_text_metrics(inp)
flags = detect_section_flags(cleaned_text, "item_1a_risk_factors")
print(metrics.negative_word_ratio, flags["investigation_flag"])

Full API

class disclosure_alpha.text_metrics.SectionTextInput(section_name: str, cleaned_text: str)[source]

Bases: object

section_name : str
cleaned_text : str
class disclosure_alpha.text_metrics.TextMetricResult(word_count: int, sentence_count: int, average_sentence_length: float, readability_score: float | None, negative_word_ratio: float, uncertainty_word_ratio: float, litigious_word_ratio: float, constraining_word_ratio: float, modal_word_ratio: float, weak_modal_word_ratio: float, moderate_modal_word_ratio: float, strong_modal_word_ratio: float, legal_regulatory_phrase_ratio: float, numeric_specificity_score: float, company_specificity_score: float, boilerplate_phrase_ratio: float)[source]

Bases: object

word_count : int
sentence_count : int
average_sentence_length : float
readability_score : float | None
negative_word_ratio : float
uncertainty_word_ratio : float
litigious_word_ratio : float
constraining_word_ratio : float
modal_word_ratio : float
weak_modal_word_ratio : float
moderate_modal_word_ratio : float
strong_modal_word_ratio : float
legal_regulatory_phrase_ratio : float
numeric_specificity_score : float
company_specificity_score : float
boilerplate_phrase_ratio : float
disclosure_alpha.text_metrics.tokenize_words(text: str) list[str][source]
disclosure_alpha.text_metrics.compute_text_metrics(inp: SectionTextInput) TextMetricResult[source]
disclosure_alpha.text_metrics.compute_metric_families(inp: SectionTextInput) list[dict[str, float | str]][source]

Return metric family rows (tone, specificity, boilerplate, liquidity, internal_controls) with raw and normalized values.

disclosure_alpha.text_metrics.detect_section_flags(text: str, section_name: str) dict[str, bool][source]

Return all v1 boolean flags for a section (False when out of scope).

disclosure_alpha.text_metrics.compute_density_metrics(text: str, section_name: str) dict[str, float][source]

MD&A keyword density: hits per 1000 words, capped 0–100.