Posts

Linear regression

Image
Our definition of a machine learning algorithm as an algorithm that is capable of improving a computer program’s performance at some task via experience is somewhat abstract. To make this more concrete, we present an example of a simple machine learning algorithm: linear regression As the name implies, linear regression solves a regression problem. In other words, the goal is to build a system that can take a vector $x \in R_n$ as input and predict the value of a scalar $y \in R$ as its output. In the case of linear regression, the output is a linear function of the input. Let $\hat{y}$ be the value that our model predicts $y$ should take on. We define the output to be $\hat{y }= w^Tx$ where $w \in R^n$ is a vector of parameters.Parameters are values that control the behavior of the system. In this case, $w_i$ is the coefficient that we multiply by feature $x_i$ before summing up the contributions from all the features. We can think of $w$ as a set of weights that determine how eac...

Gradient Descent, Stochastic Gradient Descent, Batch Gradient Descent

Gradient Descent Gradient descent is an optimization algorithm often used for finding the weights or coefficients of machine learning algorithms, such as artificial neural networks and logistic regression. It works by having the model make predictions on training data and using the error on the predictions to update the model in such a way as to reduce the error. The goal of the algorithm is to find model parameters (e.g. coefficients or weights) that minimize the error of the model on the training dataset. It does this by making changes to the model that move it along a gradient or slope of errors down toward a minimum error value. This gives the algorithm its name of “gradient descent.” Types of Gradient Descent Gradient descent can vary in terms of the number of training patterns used to calculate error; that is in turn used to update the model. The number of patterns used to calculate the error includes how stable the gradient is that is used to update the model. We will see that t...

Penalty Based Regularizations- L1 and L2

Image
Penalty-based regularization is the most common approach for reducing overfitting. In order  to understand this point, let us revisit the example of the polynomial with degree $d$. In this  case, the prediction $\hat{y}$ for a given value of $x$ is as follows: $\hat{y}=\sum_{i=0}^d w_ix^i$ It is possible to use a single-layer network with $d$ inputs and a single bias neuron with weight $w_0$ in order to model this prediction. The $i$th input is $x_i$. This neural network uses linear activations, and the squared loss function for a set of training instances $(x, y)$ from data set $D$ can be defined as follows: $L=\sum_{(x,y) \in D} (y-\hat{y})^2$ As discussed earlier, a large value of $d$ tends to increase overfitting.One possible solution to this problem is to reduce the value of d. In other words, using a model with economy in parameters leads to a simpler model. For example, reducing $d$ to 1 creates a linear model that has fewer degrees of freedom and tends to fit the data ...