Curriculum

Data Science
Curriculum

Data Science

About the Program

Imagine You are hired by Chuimatt Supermarkets, one of the largest retail chains in East Africa, facing a significant problem: their sales are stagnant, and many of their outlets are closing down. They need your expertise to help them analyze what went wrong and provide a solution for future growth. The data they have is vast, yet they struggle with managing it through outdated spreadsheets.

You are tasked with building a data-driven solution that will not only help Chuimatt understand their sales trends but also predict future sales, identify which departments are likely to perform well, and provide actionable insights to improve their overall performance.

Chuimatt wants to know:

  1. Why their sales are declining and which factors are causing the downturn.
  2. Which departments are underperforming and why.
  3. How to leverage historical data to predict sales, especially during peak seasons such as Christmas, Black Friday, and Back-to-School.

They give you three data files containing historical sales data, outlet details, and promotions. Your challenge is to process and analyze this data, uncover hidden insights, and predict the next year's sales.

Project Objectives:

  1. Sales Trend Analysis: Analyze past sales data across 50 outlets in East Africa, identifying trends, seasonal impacts, and major influencing factors (e.g., holidays, promotions, and external events like elections).
  2. Sales Prediction: Build a machine learning model to predict sales by department for the upcoming year, taking seasonality and external factors into account.
  3. Markdown & Promotion Impact: Assess how markdowns and promotions (like Black Friday sales) affect revenue and which holidays drive the most sales.
  4. Actionable Recommendations: Based on your insights, generate a set of business recommendations prioritized by their potential impact on Chuimatt’s revenue growth.
  5. Data Reporting & Visualization: Develop a web-based dashboard to display the insights in an easy-to-understand format for stakeholders at Chuimatt. The dashboard should allow them to query data, view sales performance by outlet and department, and assess the effectiveness of promotions.
Requirements

At least a core i5 computer, 8GB RAM. Prior programming experience isn't required.

Student to Teacher Ratio of 10:1
Data Science

Curriculum

  1. Introduction to Data Science
    • What is Data Science? Key concepts, processes, and applications
    • The role of a Data Scientist: Problem solver, data analyst, and decision-maker
    • Tools of the Trade: Python, Jupyter Notebook, SQL, visualization libraries
  2. Business Problem Understanding
    • Introduction to the business context of data science projects (e.g., sales forecasting, customer segmentation, anomaly detection, etc.)
    • Translating business goals into data science tasks: Framing the problem (e.g., predicting sales, understanding trends, improving operational efficiencies)
    • Case Study Exploration: Chuimatt Supermarkets (or other business cases)
  3. Project Framework
    • Defining project objectives and key deliverables
    • Setting up the project environment (tools, platforms, and software)
    • Overview of the project management process: Milestones, timelines, and collaboration
  1. Module 1: Learning Python for Data Science

    Overview: This module is designed to introduce students to Python programming from scratch, with a focus on how Python is used in Data Science. By the end of this module, students will have a solid foundation in Python and be ready to dive into more advanced data science topics in subsequent modules.

  2. 1.1 Introduction to Python
    • What is Python?
      • A versatile, beginner-friendly programming language used in a wide variety of fields, especially in data science and machine learning.
      • Python’s popularity in data science: Libraries like Pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn.
    • Why Python for Data Science?
      • Easy to learn and use: Clear, readable syntax.
      • Extensive libraries and frameworks for data manipulation, analysis, and machine learning.
      • Strong community support.
    • Setting up Python Environment
      • Installing Python: Anaconda distribution (recommended for data science).
      • Installing Python on different platforms: Windows, macOS, Linux.
      • Setting up Jupyter Notebooks for interactive coding.
      • PyCharm or VS Code for writing Python scripts.
  3. 1.2 Python Basics: Syntax and Data Types
    • Python Syntax and Structure
      • Writing Python code in a script or interactive environment.
      • Understanding indentation and code blocks.
      • Basic commands: Print statements, comments, basic input/output.
    • Data Types in Python
      • Numbers: Integers (int), Floating-point numbers (float).
      • Strings: Text data (str).
      • Boolean: True or False (bool).
      • Lists: Ordered collection of elements (list).
      • Tuples: Immutable sequence (tuple).
      • Dictionaries: Key-value pairs (dict).
      • Sets: Unordered collection of unique elements (set).
    • Basic Operations
      • Arithmetic operations (+, -, *, /).
      • String manipulation: Concatenation, slicing, formatting.
      • List operations: Indexing, slicing, appending.
      • Dictionary operations: Adding/removing keys, accessing values.
  4. 1.3 Control Flow: Conditional Statements and Loops
    • If-Else Statements
      • Writing conditional logic with if, elif, and else.
      • Understanding comparison operators (==, !=, <, >, etc.).
      • Using logical operators (and, or, not).
    • Loops in Python
      • For Loop: Iterating over a sequence (e.g., list, string, range).
      • While Loop: Repeating a block of code as long as a condition is true.
      • Using break and continue to control loop flow.
  5. 1.4 Functions in Python
    • Defining Functions
      • The purpose of functions: Reusability, abstraction.
      • Function syntax: def function_name(): Arguments and return values.
      • Calling functions with arguments.
    • Built-in Functions
      • Common Python built-in functions: print(), len(), type(), input().
      • Working with list comprehension for efficient loops and conditionals.
  6. 1.5 Python Libraries: Introduction to Data Science Tools
    • Installing and Importing Libraries
      • Using pip and conda to install libraries.
      • Importing libraries in Python: import pandas as pd, import numpy as np, etc.
    • Exploring Essential Data Science Libraries
      • Pandas: For data manipulation and analysis.
      • NumPy: For numerical operations and handling arrays.
      • Matplotlib: For basic data visualization (charts, graphs).
      • Seaborn: For statistical data visualization.
  7. 1.6 Introduction to Data Structures in Python
    • Lists, Tuples, and Dictionaries
      • Creating, accessing, and manipulating lists, tuples, and dictionaries.
      • Common methods: .append(), .pop(), .remove(), .keys(), .values().
    • Using Collections
      • Introduction to collections module: Counter, defaultdict.
      • Working with sets and handling duplicates.
      • Sorting and filtering collections using list comprehensions.
  8. 1.7 Working with Data in Python
    • Reading and Writing Files
      • Opening, reading, and writing files in Python.
      • Handling CSV files with Python’s built-in csv module.
      • Introduction to Pandas: Reading and writing data using pd.read_csv() and df.to_csv().
    • Introduction to Pandas DataFrames
      • What is a DataFrame? An overview of how it is used to store tabular data.
      • Loading and inspecting data with Pandas (head(), tail(), info()).
      • Accessing specific rows, columns, and elements in a DataFrame.
  9. 1.8 Basic Data Analysis and Visualization
    • Data Cleaning Basics
      • Handling missing data using Pandas (isnull(), dropna(), fillna()).
      • Dropping duplicates and cleaning strings.
      • Renaming columns and other data transformations.
    • Basic Data Visualization
      • Plotting simple graphs with Matplotlib: Line plots, bar charts, histograms.
      • Plotting data with Seaborn: Scatter plots, heatmaps, and pair plots.
      • Visualizing data trends: Time-series, relationships between variables.
  10. 1.9 Final Project: Python in Practice
    • Mini Project: Analyzing a Simple Dataset
      • Students will use what they've learned to analyze a small dataset (e.g., sales data or customer data).
      • Tasks: Load the data, clean it, perform basic analysis, and create visualizations.
      • Presenting the findings: Using Python and Jupyter Notebooks to document the analysis.
  1. Data Sources and Acquisition
    • Understanding different data formats: SQL databases, CSV files, APIs, web scraping
    • Connecting to databases (e.g., SQL, NoSQL) and data pipelines
    • Importing data into Python using Pandas, SQLAlchemy, and other tools
  2. Data Exploration and Cleaning
    • Overview of data exploration: Descriptive statistics, visualizations (e.g., histograms, bar plots, correlation heatmaps)
    • Data cleaning: Handling missing values, duplicate entries, and outliers
    • Data preprocessing: Encoding categorical variables, scaling numerical features, and normalizing data
  3. Initial Insights from Data
    • Identifying key features and understanding relationships in data
    • Creating basic visualizations using Matplotlib and Seaborn
    • Identifying patterns and trends in data (e.g., seasonality, time-based trends)
  1. Feature Engineering
    • Creating new features based on existing data: Aggregation, time-based features, external factors
    • Extracting insights: Interaction terms, feature crossing, and transformations
    • Handling domain-specific features: Dealing with promotions, holidays, regional differences
  2. Data Transformation for Modeling
    • Encoding categorical features (One-Hot Encoding, Label Encoding, etc.)
    • Feature scaling: Standardization and normalization
    • Time-series data preparation: Creating lag features, rolling averages, and other transformations
  3. Feature Selection and Dimensionality Reduction
    • Techniques for selecting important features: Correlation analysis, Feature Importance from models (e.g., Random Forest, XGBoost)
    • Dimensionality reduction: PCA (Principal Component Analysis), t-SNE for visualization
  1. Introduction to Supervised Learning
    • Regression Models: Linear Regression, Ridge and Lasso Regression
    • Classification Models: Logistic Regression, Decision Trees, Random Forests, and Support Vector Machines
    • Model evaluation metrics: MAE, MSE, RMSE, R², AUC, Accuracy, Precision, Recall, F1-score
  2. Unsupervised Learning
    • Clustering Techniques: K-Means, DBSCAN, Hierarchical Clustering
    • Dimensionality Reduction: PCA, t-SNE, UMAP
    • Anomaly Detection: Isolation Forest, One-Class SVM
  3. Advanced Machine Learning Algorithms
    • Ensemble Learning: Random Forests, AdaBoost, Gradient Boosting, and XGBoost
    • Model stacking and blending for improved predictions
    • Hyperparameter tuning: GridSearchCV and RandomizedSearchCV
  4. Time Series Analysis (Optional)
    • Time Series Forecasting: ARIMA, Exponential Smoothing, and Facebook Prophet
    • Handling seasonality, trends, and cyclical data
  1. Visualizing Data with Python
    • Introduction to visualizing data using Matplotlib, Seaborn, and Plotly
    • Time series visualizations, trend analysis, and seasonal effects
    • Geospatial visualizations for location-based analysis (e.g., store sales by region)
  2. Creating Interactive Dashboards
    • Using Streamlit or Dash to create interactive web applications
    • Creating dashboards for sales trends, predictions, and insights
    • Integrating with external data sources for real-time updates
  3. Reporting Insights and Presenting Results
    • Generating business insights from model predictions
    • Communicating findings in business-friendly language
    • Creating reports and presentations with Power BI or Tableau (optional)
  4. Interactive Dashboards (Advanced)
    • Advanced dashboards with Kibana, Elasticsearch, or other reporting tools for large datasets
    • Building dashboards that allow users to query data in real-time
  1. Model Deployment
    • Introduction to deploying models: Why deploy models? When to deploy models?
    • Exporting models using Pickle or Joblib
    • Building APIs using Flask or FastAPI for serving models
  2. Cloud Deployment
    • Deploying models on cloud platforms like Heroku, AWS, or Google Cloud
    • Setting up CI/CD pipelines for continuous integration and delivery of updates
  3. Model Monitoring and Maintenance
    • Monitoring model performance in production (drift detection, retraining models)
    • Setting up alerts and logs for production models
    • Retraining models as new data becomes available
  1. Project Planning
    • Overview of the final project: What is the business problem? What datasets are provided?
    • Setting up project scope, timeline, and deliverables
  2. Data Collection and Preparation
    • Working with real-world datasets (similar to the Chuimatt Supermarket case)
    • Collecting, cleaning, and preparing data for modeling
  3. Machine Learning Modeling
    • Building predictive models based on the business problem
    • Experimenting with different algorithms and techniques
  4. Reporting and Visualization
    • Creating dashboards and presenting insights from the model
    • Reporting business recommendations based on data science results
  5. Final Deliverables
    • Submitting code, model, and reports
    • Presentation of the solution to the client (instructor/peers)
  1. Python: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Plotly, Dash, Flask, PySpark
  2. SQL: SQLAlchemy, SQLite, PostgreSQL for querying data
  3. Machine Learning: Regression, Classification, Time-Series, Ensemble Models, Hyperparameter Tuning
  4. Visualization: Matplotlib, Seaborn, Plotly, Dash, Streamlit, Kibana, Power BI/Tableau (optional)
  5. Deployment: Flask, FastAPI, Heroku, AWS, Google Cloud
  6. Version Control: Git, GitHub for collaborative development


Program Expectations

By the end of this course, here’s what you’ll be able to achieve:

  1. Gain proficiency in data science methodologies for solving real-world business problems.
  2. Learn how to prepare, analyze, and transform data for predictive modeling.
  3. Understand how to build machine learning models and evaluate their performance.
  4. Develop the skills to visualize data and present insights to business stakeholders.
  5. Be capable of deploying machine learning models and maintaining them in a production environment.
  6. Apply the learning to real-world projects like sales prediction, customer segmentation, and anomaly detection.
Scroll