Posts

Ensemble Methods- Bagging resampling and Boosting

Image
Ensemble methods derive their inspiration from the bias-variance trade-off. One way of reducing the error of a classifier is to find a way to reduce either its bias or the variance without affecting the other component. Ensemble methods are used commonly in machine learning, and two examples of such methods are bagging and boosting . The former is a method for variance reduction, whereas the latter is a method for bias reduction. Most ensemble methods in neural networks are focused on variance reduction. This is because neural networks are valued for their ability to build arbitrarily complex models in which the bias is relatively low. However, operating at the complex end of the bias variance trade-off almost always leads to higher variance, which is manifested as overfitting. Therefore, the goal of most ensemble methods in the neural network setting is variance reduction (i.e., better generalization). Bagging (short for bootstrap aggregating) is a technique for reducing generaliza...

Early Stopping

Image
Neural networks are trained using variations of gradient-descent methods. In most optimization models, gradient-descent methods are executed to convergence. However, executing gradient descent to convergence optimizes the loss on the training data, but not necessarily on the out-of-sample test data. This is because the final few steps often overfit to the specific nuances of the training data, which might not generalize well to the test data. A natural solution to this dilemma is to use early stopping . In this method, a portion of the training data is held out as a validation set. The backpropagation-based training is only applied to the portion of the training data that does not include the validation set. At the same time, the error of the model on the validation set is continuously monitored. At some point, this error begins to rise on the validation set, even though it continues to reduce on the training set. This is the point at which further training causes overfitting. There...

Generalization Issues in Model Tuning and Evaluation

Image
There are several practical issues in the training of neural network models that one must be careful of because of the bias-variance trade-off. The first of these issues is associated with model tuning and hyperparameter choice. For example, if one tuned the neural network with the same data that were used to train it, one would not obtain very good results because of overfitting. Therefore, the hyperparameters (e.g., regularization parameter) are tuned on a separate held-out set than the one on which the weight parameters on the neural network are learned. Given a labeled data set, one needs to use this resource for training, tuning, and testing the accuracy of the model. Clearly, one cannot use the entire resource of labeled data for model building (i.e., learning the weight parameters). For example, using the same data set for both model building and testing grossly overestimates the accuracy. This is because the main goal of classification is to generalize a model of labeled data...

Data Augmentation

The best way to make a machine learning model generalize better is to train it on more data. Of course, in practice, the amount of data we have is limited. One way to get around this problem is to create fake data and add it to the training set.For some machine learning tasks, it is reasonably straightforward to create new fake data. A common trick to reduce overfitting in convolutional neural networks is the idea of data augmentation. In data augmentation, new training examples are generated by using transformations on the original examples. Although it works better in some domains than others. Image processing is one domain to which data augmentation is very well suited. This is because many transformations such as translation, rotation, patch extraction, and reflection do not fundamentally change the properties of the object in an image. However, they do increase the generalization power of the data set when trained with the augmented data set. For example, if a data set is train...

Deep Feedforward Network -learning XOR

Image
  Modern deep learning provides a very powerful framework for supervised learning. By adding more layers and more units within a layer, a deep network can represent functions of increasing complexity. Most tasks that consist of mapping an input vector to an output vector, and that are easy for a person to do rapidly, can be accomplished via deep learning, given sufficiently large models and sufficiently large datasets of labeled training examples. Deep feedforward networks, also often called feedforward neural networks, or multilayer perceptrons (MLPs), are the quintessential deep learning models. The goal of a feedforward network is to approximate some function $f^∗$. For example, for a classifier, $y = f^∗(x)$ maps an input $x$ to a category $y$. A feedforward network defines a mapping $y = f (x; θ)$ and learns the value of the parameters $\theta $ that result in the best function approximation. These models are called feedforward because information flows through the function b...

Previous Year Question Papers CST 395 Neural Network and Deep Learning KTU

 

Regression with Gradient descent-Python code

Image
Optimizing parameters is the ultimate goal of every machine learning algorithm. You want to get the optimum value of the slope and the intercept to get the line of best fit in linear regression problems. You also want to get the optimum value for the parameters of a sigmoidal curve in logistic regression problems. Gradient Descent does it all Loss Function: A function that returns the cost associated with the model and measures how well our model is doing on the training data. If the cost is too high, it means that the predictions by our model are deviating too much from the observed data. In any machine learning algorithm, our ultimate mission is to minimize the loss function. Various loss functions which we use are: Regression Losses: L1 Loss / Mean Absolute Error L2 Loss / Mean Squared Error Root Mean Squared Error Classification Losses: Log Loss (Cross-Entropy Loss) SVM Loss (Hinge Loss) Learning Rate: This is the hyperparameter that determines the steps the gradient descent al...