Methods / Audit

Leakage-Safe Research Design

The 50-company public package is designed as an auditable empirical finance workflow: filings are timestamped, features are fit only on training windows, models are tuned only on validation windows, and portfolio outputs are treated as diagnostics.

Method pipeline

From Filing Time to Claim Boundary

SEC 10-Kfiling time and sections
Event labelsfuture volatility and CAR
Rolling OOStrain / validation / test
Text featuresLM tone + TF-IDF/SVD
Audit reportscoverage and boundaries

Controls

Research Controls

Event-Time Alignment

Filing timestamps are mapped to prediction times before labels are constructed, so future price windows cannot enter features.

Rolling Splits

Experiments use rolling train, validation, and test windows with purge records for forward-label overlap checks.

Train-Window-Only TF-IDF

TF-IDF/SVD vocabularies are fit inside each training window and tracked with vocabulary manifests and hashes.

Validation-Only Tuning

Ridge and XGBoost hyperparameters are selected on validation Rank IC; test metrics are not used for tuning.

Preregistered Rules

Primary prediction and portfolio specifications are separated from robustness and exploratory comparisons.

Multiple Testing Disclosure

The run reports 568 tested specifications with Bonferroni, Holm, and Benjamini-Hochberg FDR adjustments.

Audit chart

Coverage Waterfall

Raw label coverage includes labels outside the configured out-of-sample windows. Eligible OOS prediction coverage is the relevant model-completeness metric.

Raw labels 48.7%
Eligible OOS 100%
Model expected 100%
Primary specs 2 / 2
Audit failures: 0 Audit warnings: 2

Prediction Boundary

The preregistered primary prediction supports exploratory volatility-forecasting evidence: Ridge on realized_volatility_1_20 reaches Rank IC 0.2606 with raw p-value 0.00017.

Trading Boundary

Portfolio results remain diagnostic. The current package does not establish formal tradable alpha, investment advice, or CRSP/WRDS-equivalent asset-pricing evidence.