Curriculum

Data Science
Foundations

Data Science Foundations

About the Program

Imagine you're a data scientist at a microfinance institution that specializes in issuing hire-purchase loans for bikes and cars. Your mission is to enable the institution to make informed, data-driven lending decisions, ensuring that only creditworthy applicants are approved while maintaining a streamlined, compliant, and secure loan process. Your responsibilities extend from analyzing applicant data to developing a scoring model that minimizes default risk and supports the institution’s growth.

Requirements

At least a core i5 computer, 8GB RAM. Prior programming experience isn't required.

Student to Teacher Ratio of 10:1
Data Science Foundations

Curriculum

  1. Module 1: Learning Python for Data Science

    Overview: This module is designed to introduce students to Python programming from scratch, with a focus on how Python is used in Data Science. By the end of this module, students will have a solid foundation in Python and be ready to dive into more advanced data science topics in subsequent modules.

  2. 1.1 Introduction to Python
    • What is Python?
      • A versatile, beginner-friendly programming language used in a wide variety of fields, especially in data science and machine learning.
      • Python’s popularity in data science: Libraries like Pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn.
    • Why Python for Data Science?
      • Easy to learn and use: Clear, readable syntax.
      • Extensive libraries and frameworks for data manipulation, analysis, and machine learning.
      • Strong community support.
    • Setting up Python Environment
      • Installing Python: Anaconda distribution (recommended for data science).
      • Installing Python on different platforms: Windows, macOS, Linux.
      • Setting up Jupyter Notebooks for interactive coding.
      • PyCharm or VS Code for writing Python scripts.
  3. 1.2 Python Basics: Syntax and Data Types
    • Python Syntax and Structure
      • Writing Python code in a script or interactive environment.
      • Understanding indentation and code blocks.
      • Basic commands: Print statements, comments, basic input/output.
    • Data Types in Python
      • Numbers: Integers (int), Floating-point numbers (float).
      • Strings: Text data (str).
      • Boolean: True or False (bool).
      • Lists: Ordered collection of elements (list).
      • Tuples: Immutable sequence (tuple).
      • Dictionaries: Key-value pairs (dict).
      • Sets: Unordered collection of unique elements (set).
    • Basic Operations
      • Arithmetic operations (+, -, *, /).
      • String manipulation: Concatenation, slicing, formatting.
      • List operations: Indexing, slicing, appending.
      • Dictionary operations: Adding/removing keys, accessing values.
  4. 1.3 Control Flow: Conditional Statements and Loops
    • If-Else Statements
      • Writing conditional logic with if, elif, and else.
      • Understanding comparison operators (==, !=, <, >, etc.).
      • Using logical operators (and, or, not).
    • Loops in Python
      • For Loop: Iterating over a sequence (e.g., list, string, range).
      • While Loop: Repeating a block of code as long as a condition is true.
      • Using break and continue to control loop flow.
  5. 1.4 Functions in Python
    • Defining Functions
      • The purpose of functions: Reusability, abstraction.
      • Function syntax: def function_name(): Arguments and return values.
      • Calling functions with arguments.
    • Built-in Functions
      • Common Python built-in functions: print(), len(), type(), input().
      • Working with list comprehension for efficient loops and conditionals.
  6. 1.5 Python Libraries: Introduction to Data Science Tools
    • Installing and Importing Libraries
      • Using pip and conda to install libraries.
      • Importing libraries in Python: import pandas as pd, import numpy as np, etc.
    • Exploring Essential Data Science Libraries
      • Pandas: For data manipulation and analysis.
      • NumPy: For numerical operations and handling arrays.
      • Matplotlib: For basic data visualization (charts, graphs).
      • Seaborn: For statistical data visualization.
  7. 1.6 Introduction to Data Structures in Python
    • Lists, Tuples, and Dictionaries
      • Creating, accessing, and manipulating lists, tuples, and dictionaries.
      • Common methods: .append(), .pop(), .remove(), .keys(), .values().
    • Using Collections
      • Introduction to collections module: Counter, defaultdict.
      • Working with sets and handling duplicates.
      • Sorting and filtering collections using list comprehensions.
  8. 1.7 Working with Data in Python
    • Reading and Writing Files
      • Opening, reading, and writing files in Python.
      • Handling CSV files with Python’s built-in csv module.
      • Introduction to Pandas: Reading and writing data using pd.read_csv() and df.to_csv().
    • Introduction to Pandas DataFrames
      • What is a DataFrame? An overview of how it is used to store tabular data.
      • Loading and inspecting data with Pandas (head(), tail(), info()).
      • Accessing specific rows, columns, and elements in a DataFrame.
  9. 1.8 Basic Data Analysis and Visualization
    • Data Cleaning Basics
      • Handling missing data using Pandas (isnull(), dropna(), fillna()).
      • Dropping duplicates and cleaning strings.
      • Renaming columns and other data transformations.
    • Basic Data Visualization
      • Plotting simple graphs with Matplotlib: Line plots, bar charts, histograms.
      • Plotting data with Seaborn: Scatter plots, heatmaps, and pair plots.
      • Visualizing data trends: Time-series, relationships between variables.
  10. 1.9 Final Project: Python in Practice
    • Mini Project: Analyzing a Simple Dataset
      • Students will use what they've learned to analyze a small dataset (e.g., sales data or customer data).
      • Tasks: Load the data, clean it, perform basic analysis, and create visualizations.
      • Presenting the findings: Using Python and Jupyter Notebooks to document the analysis.
  1. Introduction to Data Science
    1. What is Data Science?
      • Definition, importance, and impact of data science across various industries
      • Introduction to Kaggle and other open data platforms
    2. Data Science Workflow
      • Overview of the data science lifecycle: Data collection, cleaning, analysis, and reporting
      • Project example: Introduction to analyzing a cleaned dataset
  2. Working with Excel Files in Python
    1. Loading Data from Excel
      • Using Pandas to load Excel files: pd.read_excel()
      • Exploring data in Jupyter Notebooks
      • Identifying data types, basic statistics, and summaries
    2. Handling Basic Data Operations
      • Filtering and selecting specific data
      • Aggregating data by groups (e.g., finding average sales per month, grouping by categories)
  1. 3.1 Introduction to Pandas DataFrames
    1. Understanding DataFrames
      • Key features and functions of a DataFrame in Pandas
      • DataFrame methods for analyzing data (describe(), info(), head(), tail())
    2. Basic Data Manipulation
      • Selecting rows and columns, slicing data
      • Sorting, filtering, and using conditional selections
  2. 3.2 Descriptive Statistics and Data Exploration
    1. Exploring Data Patterns
      • Generating summary statistics for numerical columns
      • Identifying distribution of categorical columns
      • Using groupby() to analyze grouped data
    2. Finding Patterns and Trends
      • Calculating averages, totals, and percentages
      • Creating pivot tables to summarize data
  1. Introduction to Data Visualization
    1. Importance of Visualizing Data
      • Why visualization matters in data science
      • Types of charts and when to use them: bar, line, pie, scatter
    2. Setting up Matplotlib
      • Introduction to Matplotlib library
      • Basic setup and plot customization: Titles, labels, colors, styles
  2. Plotting Data with Matplotlib
    1. Creating Common Visualizations
      • Bar charts: Showing categorical data
      • Line charts: Trends over time
      • Scatter plots: Examining relationships between variables
      • Histograms: Understanding distribution of numerical data
    2. Advanced Visualization Techniques
      • Adding multiple data series to a plot
      • Customizing legends, annotations, and figure sizes
  3. Introduction to Seaborn (Optional)
    1. Adding Context with Seaborn
      • Introduction to Seaborn’s statistical plots: Count plots, box plots
      • Creating more detailed and aesthetically pleasing visualizations
  1. Loading and Preparing the Kaggle Dataset
    1. Loading Data in Jupyter Notebook
      • Using Pandas to load and inspect Kaggle Excel data
      • Examining the dataset’s variables, data types, and structure
    2. Setting Analysis Goals
      • Defining key questions based on data columns and project objectives
      • Identifying which columns to focus on for analysis (e.g., sales trends, seasonal patterns)
  2. Conducting Data Analysis
    1. Descriptive Analysis
      • Calculating descriptive statistics for relevant columns
      • Identifying key metrics (e.g., highest and lowest sales, average values)
    2. Trend Analysis
      • Using groupby() to track changes over time (e.g., monthly or yearly trends)
      • Visualizing trends in sales, profit, or other key metrics with line and bar charts
  3. Generating Basic Reports
    1. Creating a Simple Dashboard
      • Displaying key metrics and visualizations in a single notebook or PDF report
      • Annotating charts and adding insights to the report for clarity
    2. Interpreting Results
      • Summarizing key findings from the data analysis
      • Suggesting potential business insights based on observed patterns (e.g., peak sales seasons, best-selling categories)


Program Expectations

By the end of this course, here’s what you’ll be able to achieve:

  1. Confidently use Python and Pandas to analyze pre-cleaned data
  2. Generate descriptive statistics and identify trends within datasets
  3. Create and customize basic data visualizations in Matplotlib
  4. Present insights and findings effectively in a Jupyter Notebook report
  5. Apply foundational data science skills to real-world datasets, gaining a practical understanding of data analysis
Scroll