IDX Ownership Data Pipeline · Marvel Harisson

Problem

Indonesian Stock Exchange disclosure PDFs contain useful ownership information, but the data is difficult to analyze because it is trapped inside semi-structured documents.

Solution

I built a Python automation pipeline that searches disclosures, downloads relevant PDFs, reconstructs ownership tables, validates extracted information, and exports structured Excel workbooks.

What I Built

Browser automation for finding documents
PDF download and parsing workflow
Positional table reconstruction
Data-validation steps and confidence warnings
Excel workbook export
Streamlit interface

Technical Details

Python
Playwright
pdfplumber
Pandas
OpenPyXL
Streamlit

What I Learned

Real-world PDFs are messy and inconsistent.
Validation is essential when extracting data automatically.
The final output should be useful for non-technical users.
Automation projects need explicit error handling.