Data Science Foundations :: Ubunifu College

Duration 3 months
Sessions 2 hours a day
When Mon - Frid
Delivery Virtual or In-Person
Certificate Yes

Price:

View Detailed Fee structure

Data Science Foundations

About the Program

Imagine you're a data scientist at a microfinance institution that specializes in issuing hire-purchase loans for bikes and cars. Your mission is to enable the institution to make informed, data-driven lending decisions, ensuring that only creditworthy applicants are approved while maintaining a streamlined, compliant, and secure loan process. Your responsibilities extend from analyzing applicant data to developing a scoring model that minimizes default risk and supports the institution’s growth.

Requirements

At least a core i5 computer, 8GB RAM. Prior programming experience isn't required.

Student to Teacher Ratio of 10:1

Data Science Foundations

Curriculum

1. Learning Python for Data Science

Module 1: Learning Python for Data Science
Overview: This module is designed to introduce students to Python programming from scratch, with a focus on how Python is used in Data Science. By the end of this module, students will have a solid foundation in Python and be ready to dive into more advanced data science topics in subsequent modules.
1.1 Introduction to Python
- What is Python?
  - A versatile, beginner-friendly programming language used in a wide variety of fields, especially in data science and machine learning.
  - Python’s popularity in data science: Libraries like Pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn.
- Why Python for Data Science?
  - Easy to learn and use: Clear, readable syntax.
  - Extensive libraries and frameworks for data manipulation, analysis, and machine learning.
  - Strong community support.
- Setting up Python Environment
  - Installing Python: Anaconda distribution (recommended for data science).
  - Installing Python on different platforms: Windows, macOS, Linux.
  - Setting up Jupyter Notebooks for interactive coding.
  - PyCharm or VS Code for writing Python scripts.
1.2 Python Basics: Syntax and Data Types
- Python Syntax and Structure
  - Writing Python code in a script or interactive environment.
  - Understanding indentation and code blocks.
  - Basic commands: Print statements, comments, basic input/output.
- Data Types in Python
  - Numbers: Integers (int), Floating-point numbers (float).
  - Strings: Text data (str).
  - Boolean: True or False (bool).
  - Lists: Ordered collection of elements (list).
  - Tuples: Immutable sequence (tuple).
  - Dictionaries: Key-value pairs (dict).
  - Sets: Unordered collection of unique elements (set).
- Basic Operations
  - Arithmetic operations (+, -, *, /).
  - String manipulation: Concatenation, slicing, formatting.
  - List operations: Indexing, slicing, appending.
  - Dictionary operations: Adding/removing keys, accessing values.
1.3 Control Flow: Conditional Statements and Loops
- If-Else Statements
  - Writing conditional logic with if, elif, and else.
  - Understanding comparison operators (==, !=, <, >, etc.).
  - Using logical operators (and, or, not).
- Loops in Python
  - For Loop: Iterating over a sequence (e.g., list, string, range).
  - While Loop: Repeating a block of code as long as a condition is true.
  - Using break and continue to control loop flow.
1.4 Functions in Python
- Defining Functions
  - The purpose of functions: Reusability, abstraction.
  - Function syntax: def function_name(): Arguments and return values.
  - Calling functions with arguments.
- Built-in Functions
  - Common Python built-in functions: print(), len(), type(), input().
  - Working with list comprehension for efficient loops and conditionals.
1.5 Python Libraries: Introduction to Data Science Tools
- Installing and Importing Libraries
  - Using pip and conda to install libraries.
  - Importing libraries in Python: import pandas as pd, import numpy as np, etc.
- Exploring Essential Data Science Libraries
  - Pandas: For data manipulation and analysis.
  - NumPy: For numerical operations and handling arrays.
  - Matplotlib: For basic data visualization (charts, graphs).
  - Seaborn: For statistical data visualization.
1.6 Introduction to Data Structures in Python
- Lists, Tuples, and Dictionaries
  - Creating, accessing, and manipulating lists, tuples, and dictionaries.
  - Common methods: .append(), .pop(), .remove(), .keys(), .values().
- Using Collections
  - Introduction to collections module: Counter, defaultdict.
  - Working with sets and handling duplicates.
  - Sorting and filtering collections using list comprehensions.
1.7 Working with Data in Python
- Reading and Writing Files
  - Opening, reading, and writing files in Python.
  - Handling CSV files with Python’s built-in csv module.
  - Introduction to Pandas: Reading and writing data using pd.read_csv() and df.to_csv().
- Introduction to Pandas DataFrames
  - What is a DataFrame? An overview of how it is used to store tabular data.
  - Loading and inspecting data with Pandas (head(), tail(), info()).
  - Accessing specific rows, columns, and elements in a DataFrame.
1.8 Basic Data Analysis and Visualization
- Data Cleaning Basics
  - Handling missing data using Pandas (isnull(), dropna(), fillna()).
  - Dropping duplicates and cleaning strings.
  - Renaming columns and other data transformations.
- Basic Data Visualization
  - Plotting simple graphs with Matplotlib: Line plots, bar charts, histograms.
  - Plotting data with Seaborn: Scatter plots, heatmaps, and pair plots.
  - Visualizing data trends: Time-series, relationships between variables.
1.9 Final Project: Python in Practice
- Mini Project: Analyzing a Simple Dataset
  - Students will use what they've learned to analyze a small dataset (e.g., sales data or customer data).
  - Tasks: Load the data, clean it, perform basic analysis, and create visualizations.
  - Presenting the findings: Using Python and Jupyter Notebooks to document the analysis.

2. Data Analysis Foundations

Introduction to Data Science
1. What is Data Science?
  - Definition, importance, and impact of data science across various industries
  - Introduction to Kaggle and other open data platforms
2. Data Science Workflow
  - Overview of the data science lifecycle: Data collection, cleaning, analysis, and reporting
  - Project example: Introduction to analyzing a cleaned dataset
Working with Excel Files in Python
1. Loading Data from Excel
  - Using Pandas to load Excel files: pd.read_excel()
  - Exploring data in Jupyter Notebooks
  - Identifying data types, basic statistics, and summaries
2. Handling Basic Data Operations
  - Filtering and selecting specific data
  - Aggregating data by groups (e.g., finding average sales per month, grouping by categories)

3. Exploring Data with Pandas

3.1 Introduction to Pandas DataFrames
1. Understanding DataFrames
  - Key features and functions of a DataFrame in Pandas
  - DataFrame methods for analyzing data (describe(), info(), head(), tail())
2. Basic Data Manipulation
  - Selecting rows and columns, slicing data
  - Sorting, filtering, and using conditional selections
3.2 Descriptive Statistics and Data Exploration
1. Exploring Data Patterns
  - Generating summary statistics for numerical columns
  - Identifying distribution of categorical columns
  - Using groupby() to analyze grouped data
2. Finding Patterns and Trends
  - Calculating averages, totals, and percentages
  - Creating pivot tables to summarize data

4. Data Visualization with Matplotlib

Introduction to Data Visualization
1. Importance of Visualizing Data
  - Why visualization matters in data science
  - Types of charts and when to use them: bar, line, pie, scatter
2. Setting up Matplotlib
  - Introduction to Matplotlib library
  - Basic setup and plot customization: Titles, labels, colors, styles
Plotting Data with Matplotlib
1. Creating Common Visualizations
  - Bar charts: Showing categorical data
  - Line charts: Trends over time
  - Scatter plots: Examining relationships between variables
  - Histograms: Understanding distribution of numerical data
2. Advanced Visualization Techniques
  - Adding multiple data series to a plot
  - Customizing legends, annotations, and figure sizes
Introduction to Seaborn (Optional)
1. Adding Context with Seaborn
  - Introduction to Seaborn’s statistical plots: Count plots, box plots
  - Creating more detailed and aesthetically pleasing visualizations

5. Analyzing a Kaggle Dataset

Loading and Preparing the Kaggle Dataset
1. Loading Data in Jupyter Notebook
  - Using Pandas to load and inspect Kaggle Excel data
  - Examining the dataset’s variables, data types, and structure
2. Setting Analysis Goals
  - Defining key questions based on data columns and project objectives
  - Identifying which columns to focus on for analysis (e.g., sales trends, seasonal patterns)
Conducting Data Analysis
1. Descriptive Analysis
  - Calculating descriptive statistics for relevant columns
  - Identifying key metrics (e.g., highest and lowest sales, average values)
2. Trend Analysis
  - Using groupby() to track changes over time (e.g., monthly or yearly trends)
  - Visualizing trends in sales, profit, or other key metrics with line and bar charts
Generating Basic Reports
1. Creating a Simple Dashboard
  - Displaying key metrics and visualizations in a single notebook or PDF report
  - Annotating charts and adding insights to the report for clarity
2. Interpreting Results
  - Summarizing key findings from the data analysis
  - Suggesting potential business insights based on observed patterns (e.g., peak sales seasons, best-selling categories)

Program Expectations

By the end of this course, here’s what you’ll be able to achieve:

Confidently use Python and Pandas to analyze pre-cleaned data
Generate descriptive statistics and identify trends within datasets
Create and customize basic data visualizations in Matplotlib
Present insights and findings effectively in a Jupyter Notebook report
Apply foundational data science skills to real-world datasets, gaining a practical understanding of data analysis

Curriculum