Owner: Muhammad Iman Qayyum Sufi Bin Mohamed Sufi

Email: [email protected]

Created On: January 2025

Project Overview:

This project focuses on building a Lakehouse data pipeline to process fitness data efficiently. I will implement a structured data flow (Bronze, Silver, Gold layers) using both batch processing and Spark Streaming to handle real-time and historical data efficiently. The pipeline will ingest and process user workout and gym activity data as the goal.

Technology Used:

  1. Programming Language - Python, PySpark, Spark Structured Streaming
  2. Scripting Language - SQL
  3. Microsoft Azure

Requirements:

About the datasets:

image.png