Skip to main content

3 Machine Learning

Scikit-Learn

OneHotEncoder

def one_hot_encode(data):
"""
Return the one-hot encoded dataframe of our input data.

Parameters
-----------
data: a dataframe that may include non-numerical features

Returns
-----------
A one-hot encoded dataframe that only contains numeric features

"""
enc = OneHotEncoder(drop='first') # remove redundancy by categories
# Eg:
# var1_apple = 1 - (var1_orange + var1_pear + var1_banana)
# var2_cat = 1 - (var2_dog + var2_fish)

enc.fit(data.select_dtypes(include='category'))
ohData = enc.transform(data.select_dtypes(include='category')).toarray()
ohResult = pd.DataFrame(ohData,columns= enc.get_feature_names_out())
# sex_Female sex_Male smoker_No smoker_Yes day_Fri day_Sat
return pd.concat([data.select_dtypes(exclude='category'),ohResult],axis =1)

Linear Regression

fit_intercept: bias as θ0\theta_0(False) or not(True)

from sklearn.linear_model import LinearRegression

sklearn_model = LinearRegression(fit_intercept=False).fit(one_hot_X,tips)
print("sklearn with bias column thetas:")
print(sklearn_model.coef_)
# sklearn with bias column thetas:
# [ 0.25496633 0.09448701 0.175992 0.14370363 0.11126269 0.17068732
# 0.084279 0.14104114 0.01958276 0.11556048 -0.02121806 0.09341886
# 0.16154746]

Pytorch

  • torch.Tensor() is an alias for the default tensor type (torch.FloatTensor).
  • torch.tensor(data, dtype=None, device=None, requires_grad=False)
Data typedtypeTensor type
32-bit floating pointtorch.float32 or torch.floattorch.*.FloatTensor
64-bit floating pointtorch.float64 or torch.doubletorch.*.DoubleTensor
16-bit floating pointtorch.float16 or torch.halftorch.*.HalfTensor
8-bit integer (unsigned)torch.uint8torch.*.ByteTensor
8-bit integer (signed)torch.int8torch.*.CharTensor
16-bit integer (signed)torch.int16 or torch.shorttorch.*.ShortTensor
32-bit integer (signed)torch.int32 or torch.inttorch.*.IntTensor
64-bit integer (signed)torch.int64 or torch.longtorch.*.LongTensor

Basic Operation

tensor shape is determined by the iteration of square brackets

torch.squeeze(): Returns a tensor with all specified dimensions of input of size 1 removed. If specified, a squeeze operation is done only in the given dimension.

a = torch.tensor(1) # torch.Size([])
a = a.unsqueeze(dim = 0) # torch.Size([1,1])

## squeeze and unsqueeze
a = torch.tensor([1,2]).unsqueeze(dim = 0) # torch.Size([1,2])
a = a.squeeze().unsqueeze(dim = 1) # torch.Size([2,1])
  • Tensor.tolist(): array
  • Tensor.item(): scalar