Auditable financial NLP research · v4 evidence

Financial 10-K Text Agent

10-K text ranks future volatility — but does not prove tradable alpha.

I designed and built an end-to-end research pipeline to test whether SEC 10-K disclosures contain out-of-sample information about future 20-day realized volatility.

Understand in 60 Seconds View Results Read Methods

500SEC 10-K filings

0.2395Preregistered primary Rank IC

0Critical audit failures

2Disclosed scope warnings

8,133 OOS predictions. Best exploratory Rank IC: 0.3668, reported separately from the preregistered primary result.

For admissions reviewers

The Project in 60 Seconds

Five answers connect the finance question, my contribution, the evidence, and its limits.

Inspect my contribution ↓

01 · Research question Does 10-K language contain forward-looking risk information?

I test whether filing text ranks companies by future 20-day realized volatility out of sample.

02 · What I built An end-to-end, leakage-controlled research workflow

SEC ingestion, section review, forward labels, rolling splits, train-only text features, models, diagnostics, and audit reports.

03 · Best-supported finding Preregistered Ridge Rank IC = 0.2395

The fixed 50-company panel shows positive OOS information for volatility ranking.

04 · What is not established No evidence of deployable trading alpha

The preregistered portfolio Sharpe is -0.8539; portfolio evidence is diagnostic only.

05 · Evidence boundary Applied-grade public-data pilot, not a CRSP/WRDS-equivalent study

The universe is a fixed active-company panel and market data use a mixed public-source stack. These limits are disclosed rather than hidden.

How to read the evidence

Primary Claims Come Before Model Exploration

The specification registry separates the two preregistered tests from robustness checks and exploratory comparisons.

Preregistered

1 prediction + 1 portfolio test

Primary prediction: Rank IC 0.2395. Primary portfolio: Sharpe -0.8539.

Diagnostics & exploration

594 specifications across 26 families

Ablations, baselines, neutralization, return targets, and model comparisons are reported separately.

Audit boundary

0 critical failures · 2 scope warnings

The warnings disclose public-data and universe limitations; they do not convert exploratory evidence into a formal claim.

Current release: 50_company_public_fmp_alpha_2016_2025_v4 · Read the audit boundary

Result snapshot

Volatility Ranking Evidence

The strongest observed result is exploratory; the preregistered Ridge result remains positive.

TF-IDF/SVD onlyExploratory best

0.3668

Industry + textIncremental diagnostic

0.3296

Industry meanEconomic baseline

0.2924

Combined text RidgePreregistered primary

0.2395

Primary predictionRank IC 0.2395

Raw p-value 0.00067. Positive exploratory volatility-ranking evidence.

Primary portfolioSharpe -0.8539

Raw p-value 0.1147. Tradable alpha is not established.

My contribution

What I Designed and Built

I developed the workflow from SEC filing ingestion to audited out-of-sample evidence.

Research design

Rolling train/validation/test splits, forward labels, preregistered specifications, and embargo-based leakage controls.

Inspect config ↗Specification registry ↗

Financial NLP pipeline

Section parsing, Loughran-McDonald tone features, train-window-only TF-IDF/SVD, and model manifests.

Inspect implementation ↗

Evaluation and audit

Rank IC, feature ablation, industry-neutral diagnostics, clustered bootstrap, coverage checks, and automated reports.

v4 result package ↗Test suite ↗

Python · scikit-learn · XGBoost · SEC EDGAR · pytest · Ruff · GitHub Actions

Why it matters

Beyond Sentiment Scores and Document Search

Forward labels

Links filing text to future volatility and abnormal-return targets.

Rolling OOS design

Separates training, validation, and test windows through time.

Train-only features

Fits TF-IDF/SVD vocabularies only inside each training window.

Evidence hierarchy

Separates preregistered claims from exploratory model comparisons.

Research audit

Discloses parser issues, bootstrap uncertainty, coverage, and multiple testing.

Pipeline

From SEC Filing to Audited Evidence

SEC Filingsacceptance timestamps

Parser Reviewsection quality flags

Labelsfuture targets

Rolling Splitsembargo + OOS

Text Featurestrain-window only

Diagnosticsablation + bootstrap

Auditclaims + boundaries

Inspect the evidence

Follow the Public Audit Trail

Compact, license-safe artifacts connect every headline number to a reproducible evidence file.

Read Working Paperstable SSRN publication page v4 Result Packageall public artifacts Factor Cardfastest result summary Feature Ablationtext versus industry Bootstrap Reportclustered confidence intervals Parser Reviewmanual quality appendix

Usage Boundary

This is an applied-grade exploratory run, not CRSP/WRDS-equivalent formal asset-pricing evidence, a survivorship-free replication, a production trading system, proven tradable alpha, or investment advice.