Neural Network — multilayer perceptron

Novia Pratiwi - est.2021
10 min readFeb 15, 2021

Before we dive into creating our own neural network, it is worth understanding why neural networks have gained such an important foothold in machine learning and AI. The first reason is that neural networks are universal function approximators. What that means is that given any arbitrary function that we are trying to model, no matter how complex, neural networks are always able to represent that function. This has a profound implication on neural networks and AI in general. Assuming that any problem in the world can be described by a mathematical function (no matter how complex), we can use neural networks to represent that function, effectively modeling anything in the world. A caveat to this is that while scientists ..

6.1 Neural Network Models

Recall that in a common scenario where we perform supervised learning, the goal is to understand and formulate the relationship between the observable data attributes X and the target values Y. When there is little prior knowledge about how X is connected to Y, or it is difficult to represent such knowledge (consider specifying the pixels in an image and the visual concept of the presence of a puppy), The task becomes building a generic function approximator, i.e. we want to deliver a mechanism that can map an element

x ∈ X ↦ y ∈ Y.

One of the most successful model families for such generic approximation task is neural networks. The plain implementation of neural networks is also called Multi-layer Perceptions (MLP). It is not difficult to guess from this namesake that the neural network model family has a close connection to one of the linear models, the Perceptrons (Links to an external site.), which we learnt in the previous part in this subject. A perception model is a straightforward formulation of the relationship between data attributes and the prediction target.

A perception computes a weighted sum of the data attributes and links the computed value to the prediction by simple thresholding (comparing to 0):

a = w 0 + w 1 ⋅ x 1 + w 2 ⋅ x 2 + … then make prediction according to if

a > 0 . There are several possible extensions to this simple scheme. So that the data model can classify data with more sophisticated

( X , Y )-relationship.

  • Extension Type-A: The output of the model can be linked to the value
  • a differently than binary thresholding, e.g. a “soft” decision can be
  • 1 1 + e − a. You are encouraged to plot the response of this function to a range of input values of
  • a to appreciate why it is called “soft decision”.
import matplotlib.pyplot as plt
import numpy as np
a = np.arange(-5, 5, 0.1) # a is an array of [-5, -4.9, ..., 4.8, 4.9]
# in practical applications, a should be computed using input data x.
# This example shows the correspondence between a values and the network
# outputs for a range to illustrate the whole picture
plt.plot(a, 1 / (1 + np.exp(-a)))
  • Extension Type-B: The output of the model can summarise the information from the input in a more generic way. E.g. it can include one input attribute directly in the output value: say, output =
  • a + x 5. So the model describes the change from the input (usually, in such a construction, we would employ the same number of models as the input data attributes,
  • h i = a + x i, for
  • i = 1 … k.)
  • Extension Type-C: More complex models can be built via recursive constructions. There are two equivalent ways of understanding this idea. Top-to-bottom: When computing a perception model we can replace the raw input data attributes using processed features. Moreover, the process and can be done using other low-level perception models. Bottom-to-top: Instead of letting the output of a perception model be directly linked to the target of prediction, we can feed those outputs into a higher layer of perception model.
  • w 1 ⋅ h 1 + w 1 ⋅ h 2 + … h 1 = φ ( u 1 , 1 x 1 + u 2 , 1 x 2 + … ) h 2 = φ ( u 1 , 2 x 1 + u 2 , 2 x 2 + … ) …
  • Both perspectives represent the same idea that perceptions can be combined hierarchically to make complex data models, as illustrated by the equation block above. Note that the hierarchy is not limited to 2 layers. Modern deep neural networks can employ dozens or more than 100 layers of perceptron-like models to represent highly complicated representation of data before attempting the prediction tasks.

Neural networks employ extension in all three aspects. In this module, our study will focus on the Type-C extension. We will consider how to perform the computation to process data in a hierarchical neural network structure and how to train such complicated models.

In the following video, we discuss the extension of the perceptron model to neural networks.

6.2 Neural Network Computation

We next consider how neural networks process data. The neural networks employ stage-wise a processing pipeline. Each layer makes a step, and each step takes the output of the previous step and generates the information to feed into the next one. So we consider the computation in one layer.

A layer consists of a set of perceptrons, each taking the same set of input variables. We first review the computation of one perceptron model (note the intentional “dummy attribute” 1 for

w 0):

1 ⋅ w 0 + x 1 ⋅ w 1 + x 2 ⋅ w 2 + ⋯ + x k ⋅ w k → a

This computation can be formulated as taking the inner product between two vectors, we will arrange the quantities in the equation as follows.

[ 1 , x 1 , x 2 , … , x k ] ⋅ [ w 0 w 1 ⋮ w k ] → a

(It will become clear soon that this seeming waste of space brings us great convenience.)

Now we have the computation of one perceptron formulated, while a layer of a neural network includes multiple perceptrons. An obvious solution would be to apply the naive computation scheme individually for all perceptions make a “layer” of a neural network. Let us give individual models subscripts to distinguish them from each other. Please be reminded that each perception model has its own set of parameters so the parameters need unique indexes to be distinguished as well. In the simplest multiple perceptron case, we consider two perceptron models taking the same set of input attributes and generate two weighted sum values as follows

[ 1 , x 1 , x 2 , … , x k ] ⋅ [ w 0 , 1 w 0 , 2 w 1 , 1 w 1 , 2 ⋮ ⋮ w k , 1 w k , 2 ] → [ a 1 , a 2 ]

Moreover, the formulation allows us to naturally represent the computation for the cases where there are multiple data samples. For example, to compute the network layer for two samples, respectively, we just need another set of indexes to distinguish individual data instances.

[ 1 , x 1 ( 1 ) , x 2 ( 1 ) , … , x k ( 1 ) 1 , x 1 ( 2 ) , x 2 ( 2 ) , … , x k ( 2 ) ] ⋅ [ w 0 , 1 w 0 , 2 w 1 , 1 w 1 , 2 ⋮ ⋮ w k , 1 w k , 2 ] → [ a 1 ( 1 ) , a 2 ( 1 ) a 1 ( 2 ) , a 2 ( 2 ) ]

It is straightforward to generalise the formulation from here to cases with more than 2 models or 2 data samples.

[ X ] n × ( k + 1 ) ⋅ [ W ] ( k + 1 ) × m → [ A ] n × m

We have looked at the mathematical models of multi-layer perceptron, i.e. neural networks. In the next section, we will try to write a computer program to implement the computation.

The following video explains the computational model and the implementation as matrix multiplication in neural networks.

6.3 Neural Network Computation Program

In a computer implementation, the model is simply represented as

def nn_forward(X, W, b):
"""
:param X: data matrix, [n x k], n instances, k samples
:param W: model parameter matrix W, [k x m]
:param b: model parameter, the offset constant, [m]
"""
A = np.dot(X, W) # dot: product between matrices
A += b # the element-wise adding for m values per instance will
# be "broadcast" n times to apply to all data samples
# produce H by transform A element-wisely
return H

Since the parameters W and b specify a network layer, they are usually considered as part of the model representation, i.e. the hypothesis space of neural networks with a certain architecture design is consisting of the real-number vector/matrix spaces

R k , m ⊕ R m. It is convenient to encapsulate the information of a model, i.e. a member hypothesis, as one program object. So we can take advantage of any object-oriented programming language to easily and safely manage the computational resources, the life cycle and the interface with other entities for the model object in a machine learning application. In Python, such a representation is demonstrated as follows

class Layer:
...
def forward(self, X):
A = np.dot(X, self.W) # note the "self" refers to the object
A = A + self.b # of layer to manage the information
...

One more thing … The matrix multiplication formulation reveals a very important role played by the element-wise transform interlaced between each pair of layers — without the non-linear transformation, multilayer processing can be represented by concatenating a series of matrix multiplication, which reduces to one matrix multiplication by the associative law. The network is then effectively reduced to one layer.

The implementation will be explained in details in the latter tutorial section in this module.

It is a week away from the final project and course lectures

Resources

ML Azure Studio — Implementing advanced algorithms

CS50 — You can also join their discord here: https://discord.gg/cs50

Content (topics)

• Deep Learning and neural network basics

• Traditional supervised and unsupervised learning overview

• Deep Learning and Convolutional Neural Networks (CNN) in details

• Software tools for deep neural networks

• Image classification using Deep CNNs

• Object detection and localization using Deep CNNs

• Computer vision-based application development and deployment

1. Algorithm Descriptions

Here is an overview of the linear, nonlinear and ensemble algorithm descriptions:

Algorithm 1: Gradient Descent.

Algorithm 2: Linear Regression.

Algorithm 3: Logistic Regression.

Algorithm 4: Linear Discriminant Analysis.

Algorithm 5: Classification and Regression Trees.

Algorithm 6: Naive Bayes.

Algorithm 7: K-Nearest Neighbors.

Algorithm 8: Learning Vector Quantization.

Algorithm 9: Support Vector Machines.

Algorithm 10: Bagged Decision Trees and Random Forest.

Algorithm 11: Boosting and AdaBoost.

2. Algorithm Tutorials

Here is an overview of the step-by-step algorithm tutorials:

Tutorial 1: Simple Linear Regression using Statistics.

Tutorial 2: Simple Linear Regression with Gradient Descent.

Tutorial 3: Logistic Regression with Gradient Descent.

Tutorial 4: Linear Discriminant Analysis using Statistics.

Tutorial 5: Classification and Regression Trees with Gini.

Tutorial 6: Naive Bayes for Categorical Data.

Tutorial 7: Gaussian Naive Bayes for Real-Valued Data.

Tutorial 8: K-Nearest Neighbors for Classification.

Tutorial 9: Learning Vector Quantization for Classification.

Tutorial 10: Support Vector Machines with Gradient Descent.

Tutorial 11: Bagged Classification and Regression Trees.

Tutorial 12: AdaBoost for Classification.

Python tutorials — worth it!

1. Pycharm Jetbrains: https://www.jetbrains.com/pycharm/documentation/pycharm-videos.html

https://www.learnpython.org/en/Numpy_Arrays

https://www.learnpython.org/en/Pandas_Basics

2. Front-end with back-end skills development

https://www.jetbrains.com/pycharm/features/web_development.html

Angular — https://angular.io/tutorial/toh-pt0

Node.js & javascript package

https://stackblitz.com/edit/angular-ugg8fq?file=src/app/app.module.ts

3. Activation function for deep learning

https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/

Keras — https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

Tensorflow — https://hashnode.com/search?q=deep%20learning

4. Backpropagation — https://learning.oreilly.com/library/view/deep-learning-for/9781119543046/c06.xhtml

Multilayer perceptron with iris dataset, Adaboost algorithms, Decision tree, another decision tree , Linear regression with melbourne housing, Neural Network (Still not finished), and MCL algorithms social networking

Thank you, if you are reading this until the end! Congrats learner//reader!

I realise medium is used for UI/UX, but for me it is towards data science, seriously if you follows me, my post is all over too, some might be about productivity and my mental thoughts, but more focusing in data learning and my journey landing to IT job. Technical Writing Bootcamp Task 4 by Ruth Ikegah — Write about what motivates you to write or blog as a developer and be very expressive. This article will serve as a reference whenever you face writer’s blog.

Remember when they say, good things come in the most unlikely of situations? Well, for me this was sort of true. Before I delve into answering this — “my journey into tech”, I need to give a little bit of background information. From not very technical background.

First , I blog to document my process as a newbie software developer. I create blog posts so I can tell people, this is how I started learning software development, my mistakes and errors, the whole learning journey, projects that I work on and many more :))

Connect with me via LinkedIn or facebook if you would like to ask some questions!

--

--

Novia Pratiwi - est.2021

Curiosity to Data Analytics & Career Journey | Educate and inform myself and others about #LEARNINGTOLEARN and technology automation