Week 2 — Data understanding

Novia Pratiwi - est.2021
2 min readApr 2, 2020

--

Online Education in the Times of COVID-19

Learn about tools, k-nearest neighbor, CRISP-DM model

TensorFlow: The content focuses on TensorFlow updates for researchers, production scaling, improvements across platforms.

TensorFlow

— Deep Learning library
—Originated at Google
— TensorFlow used a lot in industry, research, because of Google’s backing
— Popular with companies like Ebay, Twitter, AirBnb etc,
— Enterprise — scales well in production with large data
— apparently easy to learn allowing novices to get started quickly
— allows for the creation of dataflow graphs (Neural Nets)
— multi-language APIs (python, C++, Java, etc).
— Open Source

Business problem: Client wants to better support the students in the class. at the start of the semester usually we get a list of names and student IDs.

  • Identify the data source, evaluate quality and attributes available
  • Business question: better support students -> identify students who have difficulty in the course and likely fail
  • Figure out who is in the class, previous subjects they have taken and marks, compare against previous students and their results.
  • Data preparation — select data identified in above. Do usually checks for data integrity, null values, etc. Do we need to augment our data with data from other data sources
  • Clean data (remove duplicates and incomplete data)
  • Modeling — build model based on previous students’ data (reserve some data for validation) idea is using previous subjects to predict mark in this course. Use test data to validate. Rework model if results are poor.
  • Evaluation — Check with client if the model is given them the information they require, is it enough that we can identify students likely to fail
  • Deployment — Front end to run model, capture results, ability to compare predictions with actual results. Final report to explain assumptions in model, anticipated limitations of model, identify any parameters that can be tweaked.

Attribute Types:

  1. color (‘blue’, ‘red’) — Nominal (= =/)
  2. Greyscale values (‘white’, ‘light grey’) — Ordinal (number)
  3. Grade (‘HD’, ‘D’, ‘C’)— Ordinal (> or <)
  4. FoR codes (Field of Research) —
    marks —
    Surname — nominal
    Country of origin — nominal
    Age as a number — interval, ratio, order, nominal
    Age: ‘child’, ‘teenager’, ‘adult’, ‘middle aged’ — Ordinal

Datasets and identify types of some of the attributes.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Novia Pratiwi - est.2021
Novia Pratiwi - est.2021

Written by Novia Pratiwi - est.2021

Curiosity to Data Analytics & Career Journey | Educate and inform myself and others about #LEARNINGTOLEARN and technology automation

No responses yet

Write a response