Covid-19 end to end - Data analysis model

Covid-19 end to end - Data analysis model

Covid-19 end to end - Data analysis model

Internship
2 weeks
Data engineering - ETL

This project demonstrates the creation of a scalable data pipeline using various AWS services. The pipeline covers the entire process from data ingestion and processing to storage in a data warehouse, followed by visualization. The technologies used include AWS S3, Glue, Athena, Redshift, and Python scripting in Jupyter Notebooks. The final output can be visualized using Power BI or Tableau. One can follow this readme file and learn their way around a cloud-based data engineering project as I've tried to capture all the basics here :)

Structure

This project is organized as follows:

  • Datasets: Stored in AWS S3.

  • Data Processing: Performed using AWS Glue.

  • Data Querying: Handled through AWS Athena.

  • Data Warehousing: Managed using AWS Redshift.

  • Scripting: Automated ETL tasks using Python in Jupyter Notebooks.

  • Data Visualization: Achieved through Power BI or Tableau.

Tools used

  • AWS S3: For storing raw and processed data.

  • AWS Glue: To crawl the datasets and define schemas.

  • AWS Athena: To run SQL queries on data stored in S3.

  • AWS Redshift: This creates a data warehouse and runs complex queries.

  • Python (Jupyter Notebooks): For scripting ETL processes.

  • Power BI / Tableau: For visualizing data stored in Redshift

Running the project

  • Set up S3 buckets: Upload your datasets to S3.

  • Configure Glue: Set up crawlers and jobs to process the data.

  • Run Athena Queries: Ensure data correctness using Athena.

  • Execute Python Scripts: Run the provided Jupyter Notebooks to perform ETL tasks.

  • Set up Redshift: Create the necessary tables and load data.

  • Visualize: Connect to Redshift from Power BI or Tableau to create visualizations.

Conclusion

This project illustrates how to build a comprehensive data pipeline using AWS services, culminating in the ability to visualize data through Power BI or Tableau. It serves as a practical example of integrating cloud-based services for data processing and analysis.

Other Projects

Let's Connect!

Let's Connect!

Let's Connect!

© Copyright 2023. All rights Reserved.

© Copyright 2023. All rights Reserved.