Project 4 – Evaluation

Feature Engineering 1. Sklearn – Label Encoding Label encoding allows you to encode labels (of categorical variables) with value between 0 and n_classes-1. For example, if you have a variable list that contains [‘Paris’, ‘Paris’, ‘ Tokyo’, ‘Amsterdam’], after label encoding, you will have [0, 0, 1, 2]. To access …

Project 4: House Prices Predictions

Results Having already completed few projects where my objective was to experience the overall workflow of a machine learning project, from data exploring and cleaning to feature engineering and building predictive models, I decided to aim for a 20th or higher ranking (top 1%) in this project, House prices predictions. Below …

Project 3: Big Mart Sales Prediction (Part 2)

5. Model, predict and solve the problem Here’s where we build our predictive model. We will be going through 6 models which include linear regression, decision tree and random forest In [1]: import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline In [2]: …

Project 3: Big Mart Sales Prediction (Part 1)

Results Workflow Problem statement and Hypothesis Data Exploration Data Cleaning Feature Engineering Model, predict and solve the problem 1. Problem Statement & Hypothesis Big Mart Sales Practice Problem The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. The data also …

Project 2: Titanic (Part 2)

3. Wrangle, prepare and cleanse the data Dropping features Based on our data analysis, we want to drop the Cabin and Ticket features as Cabin feature is highly incomplete and Ticket feature contains high ratio of duplicates. In [14]: print(“Before”, train_df.shape, test_df.shape, combine[0].shape, combine[1].shape) train_df = train_df.drop([‘Ticket’,’Cabin’],axis=1) test_df = test_df.drop([‘Ticket’,’Cabin’],axis=1) combine …

Project 2: Titanic (Part 1)

Results Workflow Defining the Question/Problem Acquire training and testing data and Analyse, identify patterns, and explore the data Wrangle, prepare, cleanse the data Model, predict and solve the problem Visualise, report, and present the problem solving steps and final solution 1. Defining the Question/Problem Titanic: Machine Learning from Disaster The …

CS50: Final Project

CS50 Final Project: Active vs Passive Investing There has been a strong debate in the financial industry whether asset managers deserve their yearly 2% management fee and 20% performance fee. Does active investment outperform passive investment? In this research project, I will be looking at whether a particular active investment …