50-company public result package

Results and Artifacts

A compact public summary of the latest applied-grade SEC 10-K text factor run: `50_company_public_fmp_alpha_2016_2025_v1`.

Run summary

50-Company Applied Pilot

Run ID50_company_public_fmp_alpha_2016_2025_v1
Universe50 U.S. large-cap firms
SampleFY2016-FY2025
SEC 10-K filings500
Labels1,500
OOS predictions4,716
Feature records520,465
Tested specifications568
Preregistered primary result

Main Result

Ridge on `realized_volatility_1_20`, evaluated by ALL_SPLITS Rank IC.

Rank IC 0.2606
Raw p-value 0.00017
Exploratory comparison

Best Observed Prediction

XGBoost on `realized_volatility_1_20` reported as model-comparison evidence, not as the preregistered primary claim.

Rank IC0.3133
Newey-West t-stat6.8479
RMSE0.00834

Model comparison

Rank IC by Model

A compact view of ALL_SPLITS test Rank IC on realized_volatility_1_20. Ridge is the preregistered primary model; XGBoost is exploratory.

XGBoost exploratory text model
0.3133
Industry mean benchmark
0.2952
Ridge preregistered primary
0.2606
Historical mean benchmark
-0.0206

Bars are scaled to the largest displayed Rank IC. Negative values indicate the score ranks future volatility in the opposite direction.

Audit

Coverage and Controls

100%
Eligible OOS coverage
0
Audit failures
2
Audit warnings
568
Tested specifications

The audit trail separates raw label coverage from eligible OOS prediction coverage, discloses multiple testing, and records the applied-grade data boundary.

Raw labels 48.7%
Eligible OOS 100%
Primary specs 2 / 2

Data Boundary

SEC EDGAR provides official 10-K filings and filing timestamps. Market data uses a mixed FMP/Yahoo public-source stack, and market-cap-at-selection values are applied-grade estimates.

This is not a CRSP/WRDS-equivalent survivorship-free replication.

Interpretation Policy

Use this package as evidence of an auditable financial NLP research workflow and exploratory volatility-prediction evidence.

Portfolio outputs are diagnostic only; this package does not establish formal tradable alpha or provide investment advice.

Reproducibility

Run the Public Code Locally

The public repository can be cloned, installed, linted, and tested without private data. Full real-data runs require API keys and local private data directories.

git clone https://github.com/uiclxh/financial-10k-text-agent.git
cd financial-10k-text-agent
python -m pip install -e ".[dev]"
python -m ruff check .
python -m pytest

Open-source code is released under the MIT License.