Ubunifu
  • About
    • Our Story & How we work
    • Methodology & Success
    • Ubunifu Kids
    • Centre for AI Research & Development
    • Career
  • Pathways
    • Software Development
    • Machine Learning (AI)
    • Data Science
    • Blockchain
    • Robotics
  • Virtual Learning
  • Updates
  • Contact
Apply
© 2020. All rights reserved

Data Science Curriculum.

Developing the capacity to innovate. Methodology

Why Data Science?

Today most companies use data to formulate strategies before making any major business decision. This helps the company to increase the number of customers, acquire new customers, serving the existing clients better & efficiently, Identify inefficiencies in a business and cut cost etc.

Methodology

Program expectation.

Just as the witchdoctors in Tanzania (😂😂😂) imagine having the skills of Identifying where a business went wrong & what are they not doing right and how to fix it, what are the customer needs and what do they prefer under what conditions, customer consumption pattern, what factors affect the business under what variables, predict correctly sales for the next one year and have the ability to build a web app to view the insights.

Methodology
Requirements

At least a core i5 computer & a 8GB RAM. Prior programming experience isn't required.

Teacher to Student Ratio

The student to Teacher ratio for online classes is 6:1 and onsite is 12:1

How Much & How Long

The program costs $ 240 USD per month for 5 months

Method of delivery

Project based Learning delivered through onsite or virtual classes.

Classes

Mon - Frid 2 hours a day with support throughout the day.

Graduating startup funding

Currently we fund the best startup in class at $15,000 for a 10% equity.

Data Science
Curriculum
01 Data Science
1. The Problem Statement
Here we will assume you have been contracted by Chuimatt Supermarkets which is failing to find out where they went wrong and how to fix that as a data scientist.
What is your Task?
  • The buyers at Chuimatt need your help
  • The data at Chuimatt has become so large its's killing their spreadsheets
  • They want a system that that can help identify sales trend and predict which department (Eg foods or Clothes) will be affected and to waht extent
  • They also want to be able to esily query the sales date (Their current system is a pain in the ass)
  • They are only giving you 3 data files to work with

  • The Data (What are in those 3 files?)
  • Historical sales data for 50 outlest accross East Africa
  • Each Outlet contains a different number of departments
  • Promortions are run throughout the yer preceeding major holidays including Christmas, Dala 7s & Masaku 7s, Back to School, Black Fridays and Githeri-man day (hehe)
  • The data is provided in a SQLite database in 3 tables which include
  • - Features table: additional data related to the outlet, department & regional activity for the given dates
    - Sales table: Historical sales data covering Jan 2017 - june 2019 (and Oh boy there was a general election during that time)
    - Outlets table: Anonymized information about the 50 outlets, indicating the type and size of the store

    What are this project's challenges?
  • You are going to make decisions with limited history
  • Seasonality might (and probably does) affect sales
  • A number of (measured) factors can potentially influence sales such as, the election, cost of fuel, weather, holidays etc

  • Now what are Chuimatt's request? What do they want you do do
  • Using the dataset provided, find out where they went wrong (why are they not making money and shutting down some outlets)
  • Predict department-wide sales for each store for next year
  • Show any effects of markdown on holiday weeks
  • provide recommended action based on the insights you gain prioritized by business impact
  • Make the analysis available via a webpage
  • Provide a web UI to view the sales data.
  • 2. Getting Started
  • Using the REPORT Framework
  • Creating Actionable Requirements
  • 3. Python Libraries, Docker, Spark and Elasticksearch
  • Your Machine Learning Toolbox
  • Jupyter Notebook
  • Pandas
  • PivotTable.js
  • Scikit-learn
  • Matplotlib
  • SQL
  • Docker
  • Spark
  • Elasticsearch
  • Logstash
  • Kibana
  • 4. Machine Learning Algorithms
  • Supervised Learning
  • Supervised vs. Unsupervised Learning
  • Unsupervised Learning
  • Deep Learning
  • Deep Learning Frameworks
  • Machine Learning Toolbox Overview
  • 5. Data Gathering & Exploration
  • Connecting to APIs
  • Introduction to Web Sraping
  • Automating Local Data Collection
  • Importing Data from SQL into Pandas
  • Exploring Data using Jupyter Notebook, Pandas and more
  • 6. Algorithm Selection and Building the System
  • How to Select Which Algorithm(s) to Test and Use
  • Challenges Encountered When Creating Models
  • Using PySpark to Prepare Data - Part 1
  • Using PySpark to Prepare Data - Part 2
  • Storing Processed Data in Elasticsearch
  • 7. Using Python and Kibana to Report your Results
  • Using Python to Analyze Data
  • Publishing Your Analysis with Python
  • Performing Analysis Using Interactive Dashboards in Kibana
  • Ubunifu
    © 2021 Ubunifu College. All rights reserved.