Week 2 — Data understanding
Online Education in the Times of COVID-19
Learn about tools, k-nearest neighbor, CRISP-DM model

TensorFlow: The content focuses on TensorFlow updates for researchers, production scaling, improvements across platforms.
TensorFlow
— Deep Learning library
—Originated at Google
— TensorFlow used a lot in industry, research, because of Google’s backing
— Popular with companies like Ebay, Twitter, AirBnb etc,
— Enterprise — scales well in production with large data
— apparently easy to learn allowing novices to get started quickly
— allows for the creation of dataflow graphs (Neural Nets)
— multi-language APIs (python, C++, Java, etc).
— Open Source
Business problem: Client wants to better support the students in the class. at the start of the semester usually we get a list of names and student IDs.
- Identify the data source, evaluate quality and attributes available
- Business question: better support students -> identify students who have difficulty in the course and likely fail
- Figure out who is in the class, previous subjects they have taken and marks, compare against previous students and their results.
- Data preparation — select data identified in above. Do usually checks for data integrity, null values, etc. Do we need to augment our data with data from other data sources
- Clean data (remove duplicates and incomplete data)
- Modeling — build model based on previous students’ data (reserve some data for validation) idea is using previous subjects to predict mark in this course. Use test data to validate. Rework model if results are poor.
- Evaluation — Check with client if the model is given them the information they require, is it enough that we can identify students likely to fail
- Deployment — Front end to run model, capture results, ability to compare predictions with actual results. Final report to explain assumptions in model, anticipated limitations of model, identify any parameters that can be tweaked.

Attribute Types:
- color (‘blue’, ‘red’) — Nominal (= =/)
- Greyscale values (‘white’, ‘light grey’) — Ordinal (number)
- Grade (‘HD’, ‘D’, ‘C’)— Ordinal (> or <)
- FoR codes (Field of Research) —
marks —
Surname — nominal
Country of origin — nominal
Age as a number — interval, ratio, order, nominal
Age: ‘child’, ‘teenager’, ‘adult’, ‘middle aged’ — Ordinal

Datasets and identify types of some of the attributes.