3 Machine Learning

Scikit-Learn

OneHotEncoder

def one_hot_encode(data):
    """
    Return the one-hot encoded dataframe of our input data.
    
    Parameters
    -----------
    data: a dataframe that may include non-numerical features
    
    Returns
    -----------
    A one-hot encoded dataframe that only contains numeric features
    
    """
    enc = OneHotEncoder(drop='first') # remove redundancy by categories
    # Eg:
    # var1_apple = 1 - (var1_orange + var1_pear + var1_banana)
    # var2_cat = 1 - (var2_dog + var2_fish)

    enc.fit(data.select_dtypes(include='category'))
    ohData = enc.transform(data.select_dtypes(include='category')).toarray()
    ohResult = pd.DataFrame(ohData,columns= enc.get_feature_names_out())
    # sex_Female    sex_Male    smoker_No   smoker_Yes  day_Fri day_Sat
    return pd.concat([data.select_dtypes(exclude='category'),ohResult],axis =1)

Linear Regression

fit_intercept: bias as $\theta_0$ (False) or not(True)

from sklearn.linear_model import LinearRegression

sklearn_model = LinearRegression(fit_intercept=False).fit(one_hot_X,tips)
print("sklearn with bias column thetas:")
print(sklearn_model.coef_)
# sklearn with bias column thetas:
# [ 0.25496633  0.09448701  0.175992    0.14370363  0.11126269  0.17068732
#   0.084279    0.14104114  0.01958276  0.11556048 -0.02121806  0.09341886
#   0.16154746]

Pytorch

torch.Tensor() is an alias for the default tensor type (torch.FloatTensor).
torch.tensor(data, dtype=None, device=None, requires_grad=False)

Data type	dtype	Tensor type
32-bit floating point	torch.float32 or torch.float	torch.*.FloatTensor
64-bit floating point	torch.float64 or torch.double	torch.*.DoubleTensor
16-bit floating point	torch.float16 or torch.half	torch.*.HalfTensor
8-bit integer (unsigned)	torch.uint8	torch.*.ByteTensor
8-bit integer (signed)	torch.int8	torch.*.CharTensor
16-bit integer (signed)	torch.int16 or torch.short	torch.*.ShortTensor
32-bit integer (signed)	torch.int32 or torch.int	torch.*.IntTensor
64-bit integer (signed)	torch.int64 or torch.long	torch.*.LongTensor

Basic Operation

tensor shape is determined by the iteration of square brackets

torch.squeeze(): Returns a tensor with all specified dimensions of input of size 1 removed. If specified, a squeeze operation is done only in the given dimension.

a = torch.tensor(1) # torch.Size([])
a = a.unsqueeze(dim = 0) # torch.Size([1,1])

## squeeze and unsqueeze
a = torch.tensor([1,2]).unsqueeze(dim = 0) # torch.Size([1,2])
a = a.squeeze().unsqueeze(dim = 1) # torch.Size([2,1])

Tensor.tolist(): array
Tensor.item(): scalar

3 Machine Learning

Scikit-Learn​

Pytorch​

Basic Operation​

Scikit-Learn

Pytorch

Basic Operation