Balázs Kégl (LAL/CNRS)
Boston housing is a small standard regression data set from the UCI Machine Learning Repository.
from __future__ import print_function
%matplotlib inline
import numpy as np
import pandas as pd
import pylab as plt
import seaborn as sns; sns.set()
local_filename = 'data/train.csv'
# Open file and print the first 3 lines
with open(local_filename) as fid:
for line in fid.readlines()[:3]:
print(line)
data = pd.read_csv(local_filename)
data.head()
data.shape
data.describe()
data.hist(figsize=(10, 20), bins=50, layout=(7, 3));
sns.pairplot(data.iloc[:, :5]); # take only 5 to make it fast enough
For submitting to the RAMP site, you will need to create a estimator.py
file that defines a get_estimator
function which returns a scikit-learn estimator. You can find an example estimator.py
file in submissions/starting_kit
.
The initial example classifier in your sandbox is:
from sklearn.ensemble import RandomForestRegressor
def get_estimator():
reg = RandomForestRegressor(
n_estimators=2, max_leaf_nodes=2, random_state=61)
return reg
Before you make your submission it is important that you test your code locally first. To submit your code, you can refer to the online documentation.