Evidence and Validation¶

Audience: Readers evaluating whether to trust Disclosure Alpha scores. Before you start: Skim What This Does and Does Not Claim for product scope and unsupported claims.

Summary¶

Deterministic scoring (deterministic_scoring_v2) was checked on S&P 500 FY2025 Item 1A risk-factor text. The table below is the public evidence record — construct checks against independent references, plus one descriptive post-filing volatility association.

Last updated: 2026-06-23.

Check	Result
Analysis cohort	478 firms (FY2025 Item 1A, S&P 500 universe n=503)
Specificity construct validity	Spearman ρ ≈ 0.87 vs NER entity density (n=478)
Boilerplate construct validity	Spearman ρ ≈ 0.74 vs cross-firm 4-gram boilerplate proxy (n=478)
Post-filing volatility association	Q5/Q1 ≈ 1.15 on 90-day realized vol (n=435)

Construct rows show our metrics track external references. The volatility row is a descriptive association only — not return prediction, alpha, or investment advice.

Cohort¶

Field	Value
Universe	S&P 500 (`data/universe/sp500.csv`, 503 names)
Fiscal year	2025
Section	Item 1A risk factors (`item_1a_risk_factors`)
Rows after quality filters	478
Rows skipped (short text)	19
Scoring model	`deterministic_scoring_v2`
Parser / metrics / dictionary	`section_extractor_v1`, `text_metrics_v3`, `built_in_dictionaries_v3`

The analysis cohort is post-filter extractions with sufficient word count and extraction confidence. It does not imply 100% index coverage — some tickers are missing from the corpus or fail extraction.

The volatility association uses 435 tickers with valid 90-day realized-vol outcomes — a separate pairing cohort from the 478-firm construct sample.

Specificity construct validity¶


Our metric	`company_specificity_per_word` (from `company_specificity_score` / word count)
Reference	spaCy NER entity density (`ner_entity_density`) — independent of our dictionaries
Association	Spearman ρ ≈ 0.87
n	478

Interpretation: Filings that score higher on company-specific language also tend to have higher named-entity density in risk-factor text. This is the strongest construct check in the current release.

Boilerplate construct validity¶


Our metric	`boilerplate_phrase_ratio` (section-level phrase hit rate)
Reference	Lang & Stice-Lawrence-style cross-firm 4-gram boilerplate proxy (`ls_boilerplate_word_ratio`)
Association	Spearman ρ ≈ 0.74
n	478

Interpretation: Our boilerplate measure moves with a literature boilerplate proxy. It is not a full replication of the LS4-gram paper measure — see Research Foundation for how the built-in metric differs.

Post-filing volatility association¶


Score	`overall_disclosure_risk_score` (v2)
Outcome	`realized_vol_90d` (90-day realized volatility post-filing)
Quintile ratio	Q5 mean / Q1 mean ≈ 1.15
n	435
Direction	Highest-risk quintile shows higher realized vol than lowest

Interpretation: Firms in the highest overall-risk quintile had roughly 15% higher average 90-day realized volatility than the lowest quintile in this sample. The effect is modest. Do not treat this as a tradeable signal or validated alpha.

Not claimed

We do not claim earnings-surprise prediction (change-score vs surprise did not show the expected monotonic pattern in v2 cache-mode runs). See What This Does and Does Not Claim.

What this does not prove¶

Buy/sell signals or expected returns
Full S&P 500 coverage on every ticker and fiscal year
That headline risk scores are “correct” in an absolute sense — only that components relate to chosen external references and one vol outcome in-sample
Comparable numeric levels between deterministic_scoring_v1 and v2 (different scales — see Versioning and Reproducibility)

Reproducing checks (contributors)¶

The public main branch does not ship validation scripts or report JSON. Maintainers refresh evidence on the git branch internal — see INTERNAL_VALIDATION.md in the repository. After a re-run, update the table on this page if headline numbers change.

What This Does and Does Not Claim — supported vs unsupported product claims
Research Foundation — literature motivation for metrics
Versioning and Reproducibility — artifact versions and v1 legacy opt-in
Legal and Disclaimer — not investment advice

Evidence and Validation¶

Summary¶

Cohort¶

Specificity construct validity¶

Boilerplate construct validity¶

Post-filing volatility association¶

What this does not prove¶

Reproducing checks (contributors)¶

Related¶