The Data Analysis track is designed to empower students with the skills to effectively collect, process, analyze, and visualize data to inform decision-making. A significant emphasis is placed on leveraging artificial intelligence (AI) and machine learning (ML) tools to enhance productivity and analytical capabilities.
Module 1: Foundations of Data & Data Science Workflow (Google & Intel Model)
This module introduces core concepts such as the definition of AI, Machine Learning, and Deep Learning, along with their historical development and current importance. It covers various data types and structures, emphasizing the significance of datasets and data sources. Students will learn about data science workflows, including identifying key roles, structuring an AI team, common misconceptions in data science, and the maintenance of AI models post-deployment. The module also focuses on problem-solving methodologies using data.
Objective: Students will understand fundamental data concepts, the data science lifecycle, and the critical role of data in AI.
Module 2: Data Collection, Preparation & Cleaning (IBM & Google Model)
This module focuses on the practical skills of data collection and analysis. It covers data classification and ensuring data usability for organizations. A significant portion is dedicated to data preparation for analysis, including data wrangling, data augmentation, and feature engineering. Students will learn techniques for cleaning data, transforming it from raw to clean states. The module also addresses common challenges such as identifying and mitigating problems like overfitting and underfitting, working with popular datasets, applying data preprocessing methods, and effective data labeling.
Objective: Students will acquire skills in collecting, preparing, and cleaning data for analysis, while understanding common data quality issues.
Module 3: Statistical Analysis & Data Manipulation (IBM & Google Model)
This module introduces students to inferential and descriptive statistics. It covers differentiating between common data distribution types and methods for sampling a population. Students will learn to calculate descriptive statistics, with practical application examples using tools like Microsoft Excel. The module emphasizes applying inferential statistical analysis to formulate data-driven recommendations and building formulas for manipulating complex datasets.
Objective: Students will apply statistical concepts to analyze data and derive meaningful insights.
Module 4: Data Visualization & Storytelling (Google & IBM Model)
This module focuses on the effective communication of data insights. It covers the principles of data storytelling with visualizations and various data visualization and presentation techniques. Students will explore ideas for creating compelling data visualizations and learn to use relevant tools, such as IBM Watson Studio, with implicit connections to tools like Power BI from Microsoft 365 Fundamentals.
Objective: Students will effectively communicate data insights through compelling visualizations and storytelling.
Module 5: Programming for Data Analysis & Machine Learning (Google, Intel, AWS Model)
This module equips students with programming skills essential for data analysis and machine learning.
R Programming for Analysis: Provides an introduction to the R programming language and its use in data analysis, including writing R scripts for automating data loading.
Python for AI/Data Analysis: Explores how the Python programming language applies to AI and practical data science, including working with Amazon SageMaker.
Machine Learning Fundamentals: Covers the basics of supervised and unsupervised learning, an introduction to deep learning, the steps involved in building a neural network model, understanding convolutional neural networks (CNN), transfer learning, and common deep learning architectures.
Cloud Data Services (AWS): Introduces various AWS cloud data services, including building Data Lakes on AWS, Batch Data Analytics, Streaming Data Analytics, Data Warehousing on AWS, and using Amazon Redshift.
MLOps Engineering on AWS: Focuses on the operationalization of machine learning models in a production environment.
Objective: Students will utilize programming languages (R, Python) and cloud services for data analysis, and understand foundational machine learning concepts.
Module 6: AI Integration for Data Productivity (Google, IBM, Intel, AWS Model)
This module focuses on leveraging AI to boost productivity and efficiency in data analysis workflows. It covers using AI to assist with data cleaning and structuring, building formulas, and manipulating complex datasets. Students will learn how AI can help in identifying questions to ask and preparing for data analysis, generating ideas for data visualizations, and creating R scripts to automate data loading from various sources. The module also introduces Generative AI concepts for executives, essentials, and developing generative AI applications.3
Objective: Students will apply AI tools and generative AI to enhance productivity and efficiency in data analysis workflows.