pipeline¶
Use when: You want the full in-memory path from filing HTML (or EDGAR ticker) to deterministic scores in one call — or you need step-by-step control over extraction, metrics, and aggregation.
Start here¶
score_filing_html()— score local HTML; optionalprior_htmlfor diffsscore_filing_ticker()— fetch from EDGAR by ticker, fiscal year, and form typecompute_section_metrics()— extract metrics, flags, diffs without aggregating to componentsextract_sections_from_html()— section extraction onlyscore_deterministic()— aggregate an existingMetricsResultFilingScoreResult— typed result with.scoresand.to_dict()
Example¶
from disclosure_alpha import score_filing_html, score_filing_ticker
# Local HTML
result = score_filing_html(open("filing.html").read(), "10-K")
print(result.scores.overall_disclosure_risk_score)
# EDGAR (requires SEC_USER_AGENT)
result = score_filing_ticker("AAPL", 2025, form_type="10-K")
print(result.to_dict()["scores"]["components"])
Custom scoring config (Python SDK)¶
Tune headline component weights, flag scoring constants, and v2 calibration context without forking the parser or metrics engine. Pass config=PipelineConfig(...) to score_filing_html(), score_filing_ticker(), score_for_model(), or score_panel_tickers().
from disclosure_alpha import PipelineConfig, ScoringConfig
from disclosure_alpha.baselines import CalibrationContext
from disclosure_alpha.pipeline import score_filing_html
from disclosure_alpha.scoring_types import COMPONENT_WEIGHTS
weights = dict(COMPONENT_WEIGHTS)
weights["disclosure_change_score"] = 0.25
weights["risk_factor_intensity_score"] = 0.10
config = PipelineConfig(
scoring=ScoringConfig(
config_id="change_heavy_v1",
component_weights=weights,
),
calibration_context=CalibrationContext(form_type="10-K", sector="financials"),
)
result = score_filing_html(html, "10-K", config=config)
print(result.versions["analytics_config_id"]) # change_heavy_v1
Default behavior is unchanged when config is omitted. Responses include versions.analytics_config_id (builtin_default for built-in weights).
Full API¶
In-memory deterministic pipeline: HTML → sections → metrics → scores.
- class disclosure_alpha.pipeline.FilingBundle(ref: 'Any', html: 'str', prior_html: 'str | None', prior_accession: 'str | None')[source]¶
Bases:
object- ref : Any¶
- html : str¶
- prior_html : str | None¶
- prior_accession : str | None¶
- class disclosure_alpha.pipeline.FilingSectionsResult(sections: 'list[ExtractedSection]', filing: 'dict[str, Any]'=<factory>, versions: 'dict[str, str]'=<factory>)[source]¶
Bases:
object- sections : list[ExtractedSection]¶
- filing : dict[str, Any]¶
- versions : dict[str, str]¶
- class disclosure_alpha.pipeline.FilingMetricsResult(metrics: 'MetricsResult', filing: 'dict[str, Any]'=<factory>, versions: 'dict[str, str]'=<factory>)[source]¶
Bases:
object- metrics : MetricsResult¶
- filing : dict[str, Any]¶
- versions : dict[str, str]¶
- class disclosure_alpha.pipeline.MetricsResult(section_metrics: 'dict[str, dict[str, float]]', section_diffs: 'dict[str, float | None]', section_flags: 'dict[str, dict[str, bool]]', section_densities: 'dict[str, dict[str, float]]', language_deltas: 'dict[str, dict[str, float]]', section_diffs_v2: 'dict[str, float | None]'=<factory>, extraction_confs: 'list[float]' = <factory>, diff_confs: 'list[float]' = <factory>, extraction_warnings: 'list[str]' = <factory>, required_sections_present: 'bool' = True, has_prior: 'bool' = True)[source]¶
Bases:
object- section_metrics : dict[str, dict[str, float]]¶
- section_diffs : dict[str, float | None]¶
- section_flags : dict[str, dict[str, bool]]¶
- section_densities : dict[str, dict[str, float]]¶
- language_deltas : dict[str, dict[str, float]]¶
- section_diffs_v2 : dict[str, float | None]¶
- extraction_confs : list[float]¶
- diff_confs : list[float]¶
- extraction_warnings : list[str]¶
-
required_sections_present : bool =
True¶
-
has_prior : bool =
True¶
- class disclosure_alpha.pipeline.FilingScoreResult(sections: 'list[ExtractedSection]', metrics: 'MetricsResult', scores: 'DeterministicAggregationResult', versions: 'dict[str, str]'=<factory>, filing: 'dict[str, Any]'=<factory>)[source]¶
Bases:
object- sections : list[ExtractedSection]¶
- metrics : MetricsResult¶
- scores : DeterministicAggregationResult¶
- versions : dict[str, str]¶
- filing : dict[str, Any]¶
-
disclosure_alpha.pipeline.extract_sections_from_html(html: str, form_type: str, *, cik: str =
'', accession_number: str ='') list[ExtractedSection][source]¶
-
disclosure_alpha.pipeline.compute_section_metrics(sections: list[ExtractedSection], prior_sections: list[ExtractedSection] | None =
None) MetricsResult[source]¶
-
disclosure_alpha.pipeline.score_deterministic(metrics: MetricsResult, *, config: PipelineConfig | None =
None) DeterministicAggregationResult[source]¶ Legacy v1 aggregation (deterministic_scoring_v1). Prefer score_for_model().
-
disclosure_alpha.pipeline.score_deterministic_v2(metrics: MetricsResult, *, config: PipelineConfig | None =
None, form_type: str | None =None) DeterministicAggregationResult[source]¶ v2 aggregation (deterministic_scoring_v2).
-
disclosure_alpha.pipeline.score_for_model(metrics: MetricsResult, scoring_model_version: str | None =
None, *, config: PipelineConfig | None =None, form_type: str | None =None) DeterministicAggregationResult[source]¶ Score metrics with the requested model (default: SCORING_MODEL_VERSION / v2).
-
disclosure_alpha.pipeline.score_filing_html(html: str, form_type: str, *, prior_html: str | None =
None, prior_form_type: str | None =None, cik: str ='', accession_number: str ='', config: PipelineConfig | None =None) FilingScoreResult[source]¶
- disclosure_alpha.pipeline.filter_metrics_result(metrics: MetricsResult, section_names: set[str]) MetricsResult[source]¶
- disclosure_alpha.pipeline.filter_sections(sections: list[ExtractedSection], section_names: set[str]) list[ExtractedSection][source]¶
-
disclosure_alpha.pipeline.load_filing_bundle(ticker: str, fiscal_year: int, *, form_type: str =
'10-K', quarter: str | None =None, use_cache: bool =True, compare_prior: bool =True) FilingBundle[source]¶
-
disclosure_alpha.pipeline.sections_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str =
'10-K', quarter: str | None =None, use_cache: bool =True, compare_prior: bool =True) FilingSectionsResult[source]¶
-
disclosure_alpha.pipeline.metrics_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str =
'10-K', quarter: str | None =None, use_cache: bool =True, compare_prior: bool =True) FilingMetricsResult[source]¶
-
disclosure_alpha.pipeline.score_filing_ticker(ticker: str, fiscal_year: int, *, form_type: str =
'10-K', quarter: str | None =None, use_cache: bool =True, compare_prior: bool =True, config: PipelineConfig | None =None) FilingScoreResult[source]¶
-
class disclosure_alpha.pipeline.PanelTickerResult(ticker: 'str', status: 'str', filing: 'dict[str, Any] | None' =
None, scores: 'DeterministicAggregationResult | None' =None, error: 'str | None' =None)[source]¶ Bases:
object- ticker : str¶
- status : str¶
-
filing : dict[str, Any] | None =
None¶
-
scores : DeterministicAggregationResult | None =
None¶
-
error : str | None =
None¶
- class disclosure_alpha.pipeline.PanelBatchResult(results: 'list[PanelTickerResult]', summary: 'dict[str, int]', versions: 'dict[str, str]')[source]¶
Bases:
object- results : list[PanelTickerResult]¶
- summary : dict[str, int]¶
- versions : dict[str, str]¶
-
disclosure_alpha.pipeline.score_panel_tickers(tickers: list[str], fiscal_year: int, *, form_type: str =
'10-K', quarter: str | None =None, use_cache: bool =True, compare_prior: bool =True, scoring_model_version: str ='deterministic_scoring_v2', config: PipelineConfig | None =None) PanelBatchResult[source]¶ Score many tickers sequentially; per-ticker errors do not fail the batch.