Introduction to Deep Learning

What is Deep Learning?

Why Deep Learning?

Hand-crafted features are time-consuming, brittle, and not scalable in practice. Deep learning allows us to learn the underlying features directly from the data.

The Perceptron

The structural building block of deep learning.

$$ \overbrace{\hat{y}}^{\text{Output}} = \overbrace{g\left(\underbrace{w_0}_{\text{Bias}} + \sum_{i=1}^m \underbrace{x_i}_{\text{Input}} \underbrace{w_i}_{\text{Weight}}\right)}^{\text{Non-Linear Activation Function}} $$

$$ \hat{y}=g\left(w_0+\boldsymbol{X}^T \boldsymbol{W}\right) $$ $$ \text{where:} \quad \boldsymbol{X}=\left[\begin{array}{c}x_1 \\ \vdots \\ x_m\end{array}\right] \quad \text{and} \quad \boldsymbol{W}=\left[\begin{array}{c}w_1 \\ \vdots \\ w_m\end{array}\right] $$

Activation Functions

Control activation and signaling between neurons for nonlinearity and adaptation to detect complex patterns in data.

\[\hat{y}=\textcolor{DarkGoldenrod}{g}\left(w_0+\boldsymbol{X}^T \boldsymbol{W}\right)\]

Types of Activation Function

Building Neural Networks with a Perceptron

A Perceptron Simplified Version

$$ z=w_0+\sum_{j=1}^m x_j w_j $$

Simplified Version of Multi-Output Perceptron

All inputs are connected to all outputs, these layers are called Dense.

$$ z_\textcolor{DarkGoldenrod}{i}=w_{0, \textcolor{DarkGoldenrod}{i}}+\sum_{j=1}^m x_j w_{j, \textcolor{DarkGoldenrod}{i}} $$

Single Layer Neural Network

Deep Neural Network

Loss Functions

The cost of prediction errors.

$$ \mathcal{L}\left(\underbrace{f\left(x^{(i)} ; \boldsymbol{W}\right)}_{\text{Prediction}}, \underbrace{y^{(i)}}_{\text{Actual}}\right) $$

Types of Loss Function

Optimization Algorithms

Optimization of neural network model parameters for loss function minimization.