Problem
Indonesian Stock Exchange disclosure PDFs contain useful ownership information, but the data is difficult to analyze because it is trapped inside semi-structured documents.
Solution
I built a Python automation pipeline that searches disclosures, downloads relevant PDFs, reconstructs ownership tables, validates extracted information, and exports structured Excel workbooks.
What I Built
- Browser automation for finding documents
- PDF download and parsing workflow
- Positional table reconstruction
- Data-validation steps and confidence warnings
- Excel workbook export
- Streamlit interface
Technical Details
- Python
- Playwright
- pdfplumber
- Pandas
- OpenPyXL
- Streamlit
What I Learned
- Real-world PDFs are messy and inconsistent.
- Validation is essential when extracting data automatically.
- The final output should be useful for non-technical users.
- Automation projects need explicit error handling.
