Projects

IDX Ownership Data Pipeline

Python Data Automation · 2026

A Python automation pipeline for searching Indonesian Stock Exchange disclosures, parsing PDFs, reconstructing ownership tables, validating extracted data, and exporting structured Excel workbooks.

Python · PDF Parsing · Data Automation · Excel · Streamlit · Playwright

IDX disclosure page filtered for shareholder announcements above five percent, with several PDF results listed.

Problem

Indonesian Stock Exchange disclosure PDFs contain useful ownership information, but the data is difficult to analyze because it is trapped inside semi-structured documents.

Solution

I built a Python automation pipeline that searches disclosures, downloads relevant PDFs, reconstructs ownership tables, validates extracted information, and exports structured Excel workbooks.

What I Built

  • Browser automation for finding documents
  • PDF download and parsing workflow
  • Positional table reconstruction
  • Data-validation steps and confidence warnings
  • Excel workbook export
  • Streamlit interface

Technical Details

  • Python
  • Playwright
  • pdfplumber
  • Pandas
  • OpenPyXL
  • Streamlit

What I Learned

  • Real-world PDFs are messy and inconsistent.
  • Validation is essential when extracting data automatically.
  • The final output should be useful for non-technical users.
  • Automation projects need explicit error handling.