Fitness App Project

Owner: Muhammad Iman Qayyum Sufi Bin Mohamed Sufi

Created On: January 2025

Project Overview:

This project focuses on building a Lakehouse data pipeline to process fitness data efficiently. I will implement a structured data flow (Bronze, Silver, Gold layers) using both batch processing and Spark Streaming to handle real-time and historical data efficiently. The pipeline will ingest and process user workout and gym activity data as the goal.

Technology Used:

Programming Language - Python, PySpark, Spark Structured Streaming
Scripting Language - SQL
Microsoft Azure
- Azure Data Lake Storage
- Azure Databricks with Unity Catalog
- Azure DevOps

Requirements:

Lakehouse Architecture – Implement a Lakehouse platform using the medallion architecture (Bronze, Silver, Gold) for structured data storage and processing.
Data Ingestion – Collect and ingest fitness data (workouts, BPM, logins) from APIs, databases, and Kafka, supporting both batch and streaming workflows.
Data Processing – Transform raw data into cleaned and aggregated datasets using Databricks, PySpark, and Delta Lake, ensuring data quality and efficiency.
Analytics & Reporting – Prepare Workout BPM Summary and Gym Summary datasets for insights, dashboards, and reporting.
Security & Automation – Implement role-based access control (RBAC) from Unity Catalog, CI/CD pipelines, and automated testing for deployment and data validation.
Scalability & Performance – Design for high availability, scalability, and cost efficiency, optimizing queries and storage for real-time and batch processing.

Project Overview:

Technology Used:

Requirements:

About the datasets: