Large Scale Machine Learning

Large Datasets We mainly benefit from a very large dataset when our algorithm has high variance (when m is small). Algorithm with high bias will not benefit from increasing size of dataset. This tell us that before setting up infrastructures to handle large datasets, it would be a good idea …

Anomaly Detection & Recommender Systems

Introduction to Anomaly Detection Problem Motivation A very common application of anomaly detection is detecting fraud. How anomaly detection works: Given a dataset, we define a model p(x) that tells us the probability that new data is not anomalous. We use a threshold ϵ (epsilon) to divide the new data into …

K-Means Clustering & Dimensionality Reduction

Introduction The K-Means clustering algorithm is the most popular and widely used algorithm for automatically grouping data into coherent subsets. Here’s how to implement the algorithm (K=2): Randomly initialise two points in the dataset called the cluster centroids Cluster assignment: assign all examples into one o two groups based on which …

Support Vector Machines

Optimisation Objective The new cost_1 and cost2_ indicates that we want: Decision boundary If the value of C is very large (say 100,000), in order to minimise the cost function we would want the value inside the summation term to be very small so that the first term is equal …

Machine Learning System Design

Prioritising what to work on Let’s assume we want to build a spam classifier. Given a set of emails, we could construct a vector for each email where each entry represents a word. The vector will normally contains 10,000 to 50,000 entries gathered by finding the most frequently used words in …

How to evaluate and improve your algorithms

Learning algorithm not performing well? What’s next? When you test your hypothesis and find that it makes large errors in its predictions, what should you try next? Get more training examples Run hypothesis with smaller sets of features Get additional features Add polynomial features Increase/Decrease lambda Machine learning diagnostic is …

Neural Networks: Learning

Cost function and Backpropagation Cost function L = total number of layers in the network si = number of units (not counting bias unit) in layer i K = number of output units/classes Backpropagation An algorithm that aims to minimise the NN cost function by computing an optimal set of parameters …

Neural Networks: Representation

Introduction Neural networks are useful for non-linear hypotheses. When solving machine learning problems, we usually deal with more than just two features. As the n features get larger, it becomes harder to model the dataset using linear or logistic regression. An example would be that suppose you are writing an …

Gradient Descent

Introduction Gradient descent can be use to minimise the many different types of cost function. For example, let’s say we have some function J(a, b) and our objective is to choose the best a and b values that minimise the J function. The idea behind gradient descent is that we …