Name: The Complete GCP Data Engineering Project - Retailer Domain
Availability: InStock

Overview

Understand the End to End Data Engineering Project for Retailer Domain, Design and Implement Scalable ETL Pipelines for Healthcare Data, Implement Key Techniques like Incremental Data, SCD2, Metadata driven approach, Medallion Arch, Error Handling, CDM , CICD & Many more.., Develop and Deploy Data Solutions with CI/CD Practices

Aspiring Data Engineers, Data Professionals, For getting interview Ready

Basic Knowledge on Python and SQL

This project focuses on building a data lake in Google Cloud Platform (GCP) for Retailer Domain
The goal is to centralize, clean, and transform data from multiple sources, enabling Retailers providers and insurance companies to streamline billing, claims processing, and revenue tracking.
GCP Services Used:
- Google Cloud Storage (GCS): Stores raw and processed data files.
- BigQuery: Serves as the analytical engine for storing and querying structured data.
- Dataproc: Used for large-scale data processing with Apache Spark.
- Cloud Composer (Apache Airflow): Automates ETL pipelines and workflow orchestration.
- Cloud SQL (MySQL): Stores transactional Electronic Medical Records (EMR) data.
- GitHub & Cloud Build: Enables version control and CI/CD implementation.
- CICD (Continuous Integration & Continuous Deployment): Automates deployment pipelines for data processing and ETL workflows.

Techniques involved :
- Metadata Driven Approach
- SCD type 2 implementation
- CDM(Common Data Model)
- Medallion Architecture
- Logging and Monitoring
- Error Handling
- Optimizations
- CICD implementation
- many more best practices

Data Sources
- MySQL Retailer Database
- MySQL Supplier Database
- API Reviews (api-reviews)
Expected Outcomes
- Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.
- Structured Data Warehouse: gold tables in BigQuery for analytical queries.
- After Analysis, Looker BI is used to generate dashboards and reports based on gold-layer tables.
- All processes (data extraction, loading into GCS, transformation in BigQuery) are managed using Apache Airflow, ensuring automation, scheduling, and monitoring.

Saidhul Shaik

I am a seasoned Cloud Data Engineer with expertise in GCP, Azure, and AWS, specializing in data engineering, analytics, and DevOps. With years of hands-on experience in building scalable data pipelines, optimizing Apache Spark, and managing cloud migrations, I am passionate about helping learners bridge the gap between theory and real-world applications. As the founder of Skill Vane Software Institute, I mentor aspiring data engineers, guiding them through industry-standard best practices, hands-on projects, and interview preparation. My goal is to empower professionals with the skills needed to thrive in cloud and big data ecosystems.

Free Enroll

The Complete GCP Data Engineering Project - Retailer Domain

Overview

Objectives

Target Audiences

Prerequisites

Description

Instructor Info

Saidhul Shaik