pipeline

Use when: You want the full in-memory path from filing HTML (or EDGAR ticker) to deterministic scores in one call — or you need step-by-step control over extraction, metrics, and aggregation.

Start here

  • score_filing_html() — score local HTML; optional prior_html for diffs

  • score_filing_ticker() — fetch from EDGAR by ticker, fiscal year, and form type

  • compute_section_metrics() — extract metrics, flags, diffs without aggregating to components

  • extract_sections_from_html() — section extraction only

  • score_deterministic() — aggregate an existing MetricsResult

  • FilingScoreResult — typed result with .scores and .to_dict()

Example

from disclosure_alpha import score_filing_html, score_filing_ticker

# Local HTML
result = score_filing_html(open("filing.html").read(), "10-K")
print(result.scores.overall_disclosure_risk_score)

# EDGAR (requires SEC_USER_AGENT)
result = score_filing_ticker("AAPL", 2025, form_type="10-K")
print(result.to_dict()["scores"]["components"])

Custom scoring config (Python SDK)

Tune headline component weights, flag scoring constants, and v2 calibration context without forking the parser or metrics engine. Pass config=PipelineConfig(...) to score_filing_html(), score_filing_ticker(), score_for_model(), or score_panel_tickers().

from disclosure_alpha import PipelineConfig, ScoringConfig
from disclosure_alpha.baselines import CalibrationContext
from disclosure_alpha.pipeline import score_filing_html
from disclosure_alpha.scoring_types import COMPONENT_WEIGHTS

weights = dict(COMPONENT_WEIGHTS)
weights["disclosure_change_score"] = 0.25
weights["risk_factor_intensity_score"] = 0.10

config = PipelineConfig(
    scoring=ScoringConfig(
        config_id="change_heavy_v1",
        component_weights=weights,
    ),
    calibration_context=CalibrationContext(form_type="10-K", sector="financials"),
)
result = score_filing_html(html, "10-K", config=config)
print(result.versions["analytics_config_id"])  # change_heavy_v1

Default behavior is unchanged when config is omitted. Responses include versions.analytics_config_id (builtin_default for built-in weights).

Full API

In-memory deterministic pipeline: HTML → sections → metrics → scores.

class disclosure_alpha.pipeline.FilingBundle(ref: 'Any', html: 'str', prior_html: 'str | None', prior_accession: 'str | None')[source]

Bases: object

ref : Any
html : str
prior_html : str | None
prior_accession : str | None
class disclosure_alpha.pipeline.FilingSectionsResult(sections: 'list[ExtractedSection]', filing: 'dict[str, Any]'=<factory>, versions: 'dict[str, str]'=<factory>)[source]

Bases: object

sections : list[ExtractedSection]
filing : dict[str, Any]
versions : dict[str, str]
class disclosure_alpha.pipeline.FilingMetricsResult(metrics: 'MetricsResult', filing: 'dict[str, Any]'=<factory>, versions: 'dict[str, str]'=<factory>)[source]

Bases: object

metrics : MetricsResult
filing : dict[str, Any]
versions : dict[str, str]
class disclosure_alpha.pipeline.MetricsResult(section_metrics: 'dict[str, dict[str, float]]', section_diffs: 'dict[str, float | None]', section_flags: 'dict[str, dict[str, bool]]', section_densities: 'dict[str, dict[str, float]]', language_deltas: 'dict[str, dict[str, float]]', section_diffs_v2: 'dict[str, float | None]'=<factory>, extraction_confs: 'list[float]' = <factory>, diff_confs: 'list[float]' = <factory>, extraction_warnings: 'list[str]' = <factory>, required_sections_present: 'bool' = True, has_prior: 'bool' = True)[source]

Bases: object

section_metrics : dict[str, dict[str, float]]
section_diffs : dict[str, float | None]
section_flags : dict[str, dict[str, bool]]
section_densities : dict[str, dict[str, float]]
language_deltas : dict[str, dict[str, float]]
section_diffs_v2 : dict[str, float | None]
extraction_confs : list[float]
diff_confs : list[float]
extraction_warnings : list[str]
required_sections_present : bool = True
has_prior : bool = True
class disclosure_alpha.pipeline.FilingScoreResult(sections: 'list[ExtractedSection]', metrics: 'MetricsResult', scores: 'DeterministicAggregationResult', versions: 'dict[str, str]'=<factory>, filing: 'dict[str, Any]'=<factory>)[source]

Bases: object

sections : list[ExtractedSection]
metrics : MetricsResult
scores : DeterministicAggregationResult
versions : dict[str, str]
filing : dict[str, Any]
to_dict() dict[str, Any][source]
disclosure_alpha.pipeline.extract_sections_from_html(html: str, form_type: str, *, cik: str = '', accession_number: str = '') list[ExtractedSection][source]
disclosure_alpha.pipeline.compute_section_metrics(sections: list[ExtractedSection], prior_sections: list[ExtractedSection] | None = None) MetricsResult[source]
disclosure_alpha.pipeline.score_deterministic(metrics: MetricsResult, *, config: PipelineConfig | None = None) DeterministicAggregationResult[source]

Legacy v1 aggregation (deterministic_scoring_v1). Prefer score_for_model().

disclosure_alpha.pipeline.score_deterministic_v2(metrics: MetricsResult, *, config: PipelineConfig | None = None, form_type: str | None = None) DeterministicAggregationResult[source]

v2 aggregation (deterministic_scoring_v2).

disclosure_alpha.pipeline.score_for_model(metrics: MetricsResult, scoring_model_version: str | None = None, *, config: PipelineConfig | None = None, form_type: str | None = None) DeterministicAggregationResult[source]

Score metrics with the requested model (default: SCORING_MODEL_VERSION / v2).

disclosure_alpha.pipeline.score_filing_html(html: str, form_type: str, *, prior_html: str | None = None, prior_form_type: str | None = None, cik: str = '', accession_number: str = '', config: PipelineConfig | None = None) FilingScoreResult[source]
disclosure_alpha.pipeline.filter_metrics_result(metrics: MetricsResult, section_names: set[str]) MetricsResult[source]
disclosure_alpha.pipeline.filter_sections(sections: list[ExtractedSection], section_names: set[str]) list[ExtractedSection][source]
disclosure_alpha.pipeline.load_filing_bundle(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True) FilingBundle[source]
disclosure_alpha.pipeline.sections_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True) FilingSectionsResult[source]
disclosure_alpha.pipeline.metrics_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True) FilingMetricsResult[source]
disclosure_alpha.pipeline.score_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True, config: PipelineConfig | None = None) FilingScoreResult[source]
class disclosure_alpha.pipeline.PanelTickerResult(ticker: 'str', status: 'str', filing: 'dict[str, Any] | None' = None, scores: 'DeterministicAggregationResult | None' = None, error: 'str | None' = None)[source]

Bases: object

ticker : str
status : str
filing : dict[str, Any] | None = None
scores : DeterministicAggregationResult | None = None
error : str | None = None
class disclosure_alpha.pipeline.PanelBatchResult(results: 'list[PanelTickerResult]', summary: 'dict[str, int]', versions: 'dict[str, str]')[source]

Bases: object

results : list[PanelTickerResult]
summary : dict[str, int]
versions : dict[str, str]
disclosure_alpha.pipeline.score_panel_tickers(tickers: list[str], fiscal_year: int, *, form_type: str = '10-K', quarter: str | None = None, use_cache: bool = True, compare_prior: bool = True, scoring_model_version: str = 'deterministic_scoring_v2', config: PipelineConfig | None = None) PanelBatchResult[source]

Score many tickers sequentially; per-ticker errors do not fail the batch.