Uber Data Analytics | Modern Data Engineering GCP Project

Architecture

Uber Data Analytics | Modern Data Engineering GCP Project

Table of contents

No heading

No headings in the article.

Introduction

The goal of this project is to perform data analytics on Uber data using various tools and technologies, including GCP Storage, Python, Compute Instance, Mage Data Pipeline Tool, BigQuery, and Looker Studio.

Technology Used Programming Language - Python Google Cloud Platform

Google Storage Compute Instance BigQuery Looker Studio Modern Data Pipeine Tool - mage.ai

Contibute to this open source project - github.com/mage-ai/mage-ai

Dataset Used TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

Here is the dataset used in the video - github.com/darshilparmar/uber-etl-pipeline-..

More info about dataset can be found here:

Website - nyc.gov/site/tlc/about/tlc-trip-record-data.. Data Dictionary - nyc.gov/assets/tlc/downloads/pdf/data_dicti..

passenger_count_dim passenger_count_id passenger_count

trip_distance_dim trip_distance_id trip_distance

rate_code_dim rate_code_id RatecodeID rate_code_name

payment_type_dim payment_type_id payment_type payment_type_name

datetime_dim datetime_id tpep_pickup_datetime pick_hour pick_day pick_month pick_year pick_weekday tpep_dropoff_datetime drop_hour drop_day drop_month drop_year drop_weekday

pickup_location_dim pickup_location_id pickup_latitude pickup_longitude

dropoff_location_dim dropoff_location_id dropoff_latitude dropoff_longitude

fact_table trip_id VendorID datetime_id passenger_count_id trip_distance_id rate_code_id store_and_fwd_flag pickup_location_id dropoff_location_id payment_type_id fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount