Lakehouse Migration for SaaS ERP
SaaS ERP Provider
Challenge
A growing SaaS ERP company was struggling with their legacy data lake. Pipeline execution times were slow, data discoverability was poor, and the team spent more time troubleshooting than building new features.
Solution
I architected and scaled their analytical data platform from a legacy data lake to a modern Lakehouse architecture using PySpark, Apache Iceberg, AWS Glue, and Python. Implemented Medallion architecture patterns (Bronze, Silver, Gold) with clear separation of ingestion, refinement, and curated analytical datasets.
My Role
Data Platform Engineer – responsible for architecture design, pipeline development, orchestration, and infrastructure automation.
Key Deliverables
- 01Lakehouse architecture with Medallion pattern using Apache Iceberg
- 02Modular PySpark pipelines with configuration-driven jobs
- 03Apache Airflow orchestration on AWS ECS with scheduling and retries
- 04Terraform-based infrastructure for reproducible deployments
Related service
View service →