diff_engine

Use when: You have current and prior section text and need change scores, topic lists, or language deltas — typically as part of a prior-filing comparison workflow.

Start here

  • compute_section_diff() — full diff result including disclosure_change_score and language_deltas

  • SectionDiffResult — typed output with similarities, topics, and confidence

  • lexical_similarity() — TF-IDF cosine similarity helper

Prior text is required for meaningful change scores. No prior → disclosure_change_score is null. See FAQ and Troubleshooting.

Example

from disclosure_alpha.diff_engine import compute_section_diff

diff = compute_section_diff(
    current_text="We may face litigation and regulatory investigation.",
    prior_text="We operate in a competitive market.",
    section_name="item_1a_risk_factors",
)
print(diff.disclosure_change_score, diff.new_topics)

Full API

class disclosure_alpha.diff_engine.SectionDiffResult(current_section_id: str | None = None, prior_section_id: str | None = None, lexical_similarity: float | None = None, semantic_similarity: float | None = None, length_change_pct: float | None = None, new_topics: list[str] = <factory>, removed_topics: list[str] = <factory>, intensified_topics: list[str] = <factory>, disclosure_change_score: float | None = None, disclosure_change_score_v2: float | None = None, diff_summary: str = '', confidence_score: float = 0.0, language_deltas: dict[str, float]=<factory>, added_sentence_count: int = 0, removed_sentence_count: int = 0, changed_numeric_count: int = 0, added_risk_language_score: float | None = None, diff_evidence: dict[str, typing.Any]=<factory>)[source]

Bases: object

current_section_id : str | None = None
prior_section_id : str | None = None
lexical_similarity : float | None = None
semantic_similarity : float | None = None
length_change_pct : float | None = None
new_topics : list[str]
removed_topics : list[str]
intensified_topics : list[str]
disclosure_change_score : float | None = None
disclosure_change_score_v2 : float | None = None
diff_summary : str = ''
confidence_score : float = 0.0
language_deltas : dict[str, float]
added_sentence_count : int = 0
removed_sentence_count : int = 0
changed_numeric_count : int = 0
added_risk_language_score : float | None = None
diff_evidence : dict[str, Any]
disclosure_alpha.diff_engine.lexical_similarity(text_a: str, text_b: str) float[source]
disclosure_alpha.diff_engine.extract_topics(text: str) set[str][source]
disclosure_alpha.diff_engine.compute_section_diff(*, current_text: str, prior_text: str | None, current_section_id: str | None = None, prior_section_id: str | None = None) SectionDiffResult[source]