⚙️ How the pipeline works
SEC EDGAR → PDF Download → MinerU Extraction → Qwen3-VL-32B Classification & Parsing → Qwen3-VL-4B Verification → HF Dataset
1 · Vision ExtractionMinerU converts PDFs to structured images preserving table layouts.
2 · Classification + ParsingQwen3-VL-32B identifies the Summary Compensation Table and parses it into typed JSON.
3 · Quality FilteringFine-tuned Qwen3-VL-4B assigns a confidence score (0–1) for each extracted table.
Compensation Breakdown
📋 Detailed Breakdown