Evidence and Validation¶
Audience: Readers evaluating whether to trust Disclosure Alpha scores. Before you start: Skim What This Does and Does Not Claim for product scope and unsupported claims.
Summary¶
Deterministic scoring (deterministic_scoring_v2) was checked on S&P 500 FY2025 Item 1A risk-factor text. The table below is the public evidence record — construct checks against independent references, plus one descriptive post-filing volatility association.
Last updated: 2026-06-23.
Check |
Result |
|---|---|
Analysis cohort |
478 firms (FY2025 Item 1A, S&P 500 universe n=503) |
Specificity construct validity |
Spearman ρ ≈ 0.87 vs NER entity density (n=478) |
Boilerplate construct validity |
Spearman ρ ≈ 0.74 vs cross-firm 4-gram boilerplate proxy (n=478) |
Post-filing volatility association |
Q5/Q1 ≈ 1.15 on 90-day realized vol (n=435) |
Construct rows show our metrics track external references. The volatility row is a descriptive association only — not return prediction, alpha, or investment advice.
Cohort¶
Field |
Value |
|---|---|
Universe |
S&P 500 ( |
Fiscal year |
2025 |
Section |
Item 1A risk factors ( |
Rows after quality filters |
478 |
Rows skipped (short text) |
19 |
Scoring model |
|
Parser / metrics / dictionary |
|
The analysis cohort is post-filter extractions with sufficient word count and extraction confidence. It does not imply 100% index coverage — some tickers are missing from the corpus or fail extraction.
The volatility association uses 435 tickers with valid 90-day realized-vol outcomes — a separate pairing cohort from the 478-firm construct sample.
Specificity construct validity¶
Our metric |
|
Reference |
spaCy NER entity density ( |
Association |
Spearman ρ ≈ 0.87 |
n |
478 |
Interpretation: Filings that score higher on company-specific language also tend to have higher named-entity density in risk-factor text. This is the strongest construct check in the current release.
Boilerplate construct validity¶
Our metric |
|
Reference |
Lang & Stice-Lawrence-style cross-firm 4-gram boilerplate proxy ( |
Association |
Spearman ρ ≈ 0.74 |
n |
478 |
Interpretation: Our boilerplate measure moves with a literature boilerplate proxy. It is not a full replication of the LS4-gram paper measure — see Research Foundation for how the built-in metric differs.
Post-filing volatility association¶
Score |
|
Outcome |
|
Quintile ratio |
Q5 mean / Q1 mean ≈ 1.15 |
n |
435 |
Direction |
Highest-risk quintile shows higher realized vol than lowest |
Interpretation: Firms in the highest overall-risk quintile had roughly 15% higher average 90-day realized volatility than the lowest quintile in this sample. The effect is modest. Do not treat this as a tradeable signal or validated alpha.
Not claimed
We do not claim earnings-surprise prediction (change-score vs surprise did not show the expected monotonic pattern in v2 cache-mode runs). See What This Does and Does Not Claim.
What this does not prove¶
Buy/sell signals or expected returns
Full S&P 500 coverage on every ticker and fiscal year
That headline risk scores are “correct” in an absolute sense — only that components relate to chosen external references and one vol outcome in-sample
Comparable numeric levels between
deterministic_scoring_v1and v2 (different scales — see Versioning and Reproducibility)
Reproducing checks (contributors)¶
The public main branch does not ship validation scripts or report JSON. Maintainers refresh evidence on the git branch internal — see INTERNAL_VALIDATION.md in the repository. After a re-run, update the table on this page if headline numbers change.
Related¶
What This Does and Does Not Claim — supported vs unsupported product claims
Research Foundation — literature motivation for metrics
Versioning and Reproducibility — artifact versions and v1 legacy opt-in
Legal and Disclaimer — not investment advice