Applied financial NLP research

Financial 10-K Text Agent

An auditable financial NLP research pipeline for testing whether SEC 10-K disclosures contain out-of-sample predictive information about future volatility and abnormal-return targets.

This project sits at the intersection of financial NLP, empirical asset pricing, rolling out-of-sample validation, and research audit. It is not a RAG demo, generic sentiment classifier, or AI trading bot.

Scale

Current Release Metrics

50
Companies
500
SEC 10-K filings
FY2016-FY2025
Sample period
1,500
Labels
4,716
OOS predictions
520k+
Feature records
568
Tested specifications
100%
Eligible OOS coverage

Main finding

Preregistered Primary Prediction

The preregistered primary prediction specification uses Ridge on future 20-day realized volatility and evaluates ALL_SPLITS Rank IC.

Ridge -> realized_volatility_1_20
Rank IC 0.2606
Raw p-value 0.00017

This supports exploratory out-of-sample evidence that 10-K text features contain ranking information about future 20-day realized volatility. The claim is prediction evidence, not tradable alpha.

Model comparison

Best Observed Exploratory Prediction

The strongest observed model-comparison result is reported separately from the preregistered primary claim.

Exploratory model-comparison result
Model Target Rank IC NW t-stat RMSE
XGBoost realized_volatility_1_20 0.3133 6.8479 0.00834

This is exploratory model-comparison evidence, not the preregistered primary claim.

Pipeline

From SEC Filing to Audited Result

1SEC Filings

Official 10-K filings and timestamps.

2Parsing

Business, Risk Factors, Legal Proceedings, MD&A.

3Labels

Future volatility and abnormal-return targets.

4Splits

Rolling train / validation / test windows.

5Features

LM tone plus train-window-only TF-IDF/SVD.

6Models

Baselines, Ridge, and XGBoost.

7Diagnostics

Rank IC, Newey-West, and portfolio diagnostics.

8Audit

Coverage, manifests, registry, checksums.

Research controls

Audit-Backed Workflow

Leakage Control

Rolling splits, filing-time alignment, label-window checks, and train-window-only vocabularies reduce look-ahead bias.

Model Comparison

Historical mean, industry mean, Ridge, and XGBoost are compared under rolling out-of-sample evaluation.

Multiple Testing

568 tested specifications are disclosed with Bonferroni, Holm, and Benjamini-Hochberg FDR adjustments.

Audit Trail

The package includes audit reports, coverage waterfalls, manifests, vocabulary hashes, prediction-scale checks, and checksums.

Usage Boundary

This is an applied-grade exploratory research run. It does not claim formal CRSP/WRDS-equivalent asset-pricing evidence, a survivorship-free research-grade universe, a production trading system, proven tradable alpha, or investment advice.

Portfolio outputs are diagnostic only. The preregistered primary portfolio specification did not establish formal tradable alpha.

Contribution

My Contributions

Research Design

Defined the empirical question, primary target, and preregistered result structure.

Data Engineering

Built document, price, label, and split artifacts across a 50-firm 10-K panel.

NLP Features

Implemented dictionary tone and train-window-only TF-IDF/SVD features.

Modeling

Compared baselines, Ridge, and XGBoost under rolling OOS evaluation.

Research Audit

Added coverage, leakage, multiple-testing, and artifact-integrity reports.

Interpretation

Reported volatility-prediction evidence while treating portfolio results as diagnostics.