pipeline¶

Use when: You want the full in-memory path from filing HTML (or EDGAR ticker) to deterministic scores in one call — or you need step-by-step control over extraction, metrics, and aggregation.

Start here¶

score_filing_html() — score local HTML; optional prior_html for diffs
score_filing_ticker() — fetch from EDGAR by ticker, fiscal year, and form type
compute_section_metrics() — extract metrics, flags, diffs without aggregating to components
extract_sections_from_html() — section extraction only
score_deterministic() — aggregate an existing MetricsResult
FilingScoreResult — typed result with .scores and .to_dict()

Example¶

from disclosure_alpha import score_filing_html, score_filing_ticker

# Local HTML
result = score_filing_html(open("filing.html").read(), "10-K")
print(result.scores.overall_disclosure_risk_score)

# EDGAR (requires SEC_USER_AGENT)
result = score_filing_ticker("AAPL", 2025, form_type="10-K")
print(result.to_dict()["scores"]["components"])

Custom scoring config (Python SDK)¶

Tune headline component weights, flag scoring constants, and v2 calibration context without forking the parser or metrics engine. Pass config=PipelineConfig(...) to score_filing_html(), score_filing_ticker(), score_for_model(), or score_panel_tickers().

from disclosure_alpha import PipelineConfig, ScoringConfig
from disclosure_alpha.baselines import CalibrationContext
from disclosure_alpha.pipeline import score_filing_html
from disclosure_alpha.scoring_types import COMPONENT_WEIGHTS

weights = dict(COMPONENT_WEIGHTS)
weights["disclosure_change_score"] = 0.25
weights["risk_factor_intensity_score"] = 0.10

config = PipelineConfig(
    scoring=ScoringConfig(
        config_id="change_heavy_v1",
        component_weights=weights,
    ),
    calibration_context=CalibrationContext(form_type="10-K", sector="financials"),
)
result = score_filing_html(html, "10-K", config=config)
print(result.versions["analytics_config_id"])  # change_heavy_v1

Default behavior is unchanged when config is omitted. Responses include versions.analytics_config_id (builtin_default for built-in weights).

Full API¶

In-memory deterministic pipeline: HTML → sections → metrics → scores.

class disclosure_alpha.pipeline.FilingBundle(ref: 'Any', html: 'str', prior_html: 'str | None', prior_accession: 'str | None')[source]¶

Bases: object

ref : Any¶

html : str¶

prior_html : str | None¶

prior_accession : str | None¶

class disclosure_alpha.pipeline.FilingSectionsResult(sections: 'list[ExtractedSection]', filing: 'dict[str, Any]'=<factory>, versions: 'dict[str, str]'=<factory>)[source]¶

Bases: object

sections : list[ExtractedSection]¶

filing : dict[str, Any]¶

versions : dict[str, str]¶

class disclosure_alpha.pipeline.FilingMetricsResult(metrics: 'MetricsResult', filing: 'dict[str, Any]'=<factory>, versions: 'dict[str, str]'=<factory>)[source]¶

Bases: object

metrics : MetricsResult¶

filing : dict[str, Any]¶

versions : dict[str, str]¶

class disclosure_alpha.pipeline.MetricsResult(section_metrics: 'dict[str, dict[str, float]]', section_diffs: 'dict[str, float | None]', section_flags: 'dict[str, dict[str, bool]]', section_densities: 'dict[str, dict[str, float]]', language_deltas: 'dict[str, dict[str, float]]', section_diffs_v2: 'dict[str, float | None]'=<factory>, extraction_confs: 'list[float]' = <factory>, diff_confs: 'list[float]' = <factory>, extraction_warnings: 'list[str]' = <factory>, required_sections_present: 'bool' = True, has_prior: 'bool' = True)[source]¶

Bases: object

section_metrics : dict[str, dict[str, float]]¶

section_diffs : dict[str, float | None]¶

section_flags : dict[str, dict[str, bool]]¶

section_densities : dict[str, dict[str, float]]¶

language_deltas : dict[str, dict[str, float]]¶

section_diffs_v2 : dict[str, float | None]¶

extraction_confs : list[float]¶

diff_confs : list[float]¶

extraction_warnings : list[str]¶

required_sections_present : bool = True¶

has_prior : bool = True¶

class disclosure_alpha.pipeline.FilingScoreResult(sections: 'list[ExtractedSection]', metrics: 'MetricsResult', scores: 'DeterministicAggregationResult', versions: 'dict[str, str]'=<factory>, filing: 'dict[str, Any]'=<factory>)[source]¶

Bases: object

sections : list[ExtractedSection]¶

metrics : MetricsResult¶

scores : DeterministicAggregationResult¶

versions : dict[str, str]¶

filing : dict[str, Any]¶

to_dict() → dict[str, Any][source]¶

disclosure_alpha.pipeline.extract_sections_from_html(html: str, form_type: str, *, cik: str = '', accession_number: str = '') → list[ExtractedSection][source]¶

disclosure_alpha.pipeline.compute_section_metrics(sections: list[ExtractedSection], prior_sections: list[ExtractedSection] | None = None) → MetricsResult[source]¶

disclosure_alpha.pipeline.score_deterministic(metrics: MetricsResult, *, config: PipelineConfig | None = None) → DeterministicAggregationResult[source]¶: Legacy v1 aggregation (deterministic_scoring_v1). Prefer score_for_model().

disclosure_alpha.pipeline.score_deterministic_v2(metrics: MetricsResult, *, config: PipelineConfig | None = None, form_type: str | None = None) → DeterministicAggregationResult[source]¶: v2 aggregation (deterministic_scoring_v2).

disclosure_alpha.pipeline.score_for_model(metrics: MetricsResult, scoring_model_version: str | None = None, *, config: PipelineConfig | None = None, form_type: str | None = None) → DeterministicAggregationResult[source]¶: Score metrics with the requested model (default: SCORING_MODEL_VERSION / v2).

disclosure_alpha.pipeline.score_filing_html(html: str, form_type: str, *, prior_html: str | None = None, prior_form_type: str | None = None, cik: str = '', accession_number: str = '', config: PipelineConfig | None = None) → FilingScoreResult[source]¶

disclosure_alpha.pipeline.filter_metrics_result(metrics: MetricsResult, section_names: set[str]) → MetricsResult[source]¶

disclosure_alpha.pipeline.filter_sections(sections: list[ExtractedSection], section_names: set[str]) → list[ExtractedSection][source]¶

disclosure_alpha.pipeline.load_filing_bundle(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True) → FilingBundle[source]¶

disclosure_alpha.pipeline.sections_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True) → FilingSectionsResult[source]¶

disclosure_alpha.pipeline.metrics_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True) → FilingMetricsResult[source]¶

disclosure_alpha.pipeline.score_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True, config: PipelineConfig | None = None) → FilingScoreResult[source]¶

class disclosure_alpha.pipeline.PanelTickerResult(ticker: 'str', status: 'str', filing: 'dict[str, Any] | None' = None, scores: 'DeterministicAggregationResult | None' = None, error: 'str | None' = None)[source]¶

Bases: object

ticker : str¶

status : str¶

filing : dict[str, Any] | None = None¶

scores : DeterministicAggregationResult | None = None¶

error : str | None = None¶

class disclosure_alpha.pipeline.PanelBatchResult(results: 'list[PanelTickerResult]', summary: 'dict[str, int]', versions: 'dict[str, str]')[source]¶

Bases: object

results : list[PanelTickerResult]¶

summary : dict[str, int]¶

versions : dict[str, str]¶

disclosure_alpha.pipeline.score_panel_tickers(tickers: list[str], fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True, scoring_model_version: str = 'deterministic_scoring_v2', config: PipelineConfig | None = None) → PanelBatchResult[source]¶: Score many tickers sequentially; per-ticker errors do not fail the batch.