3 Machine Learning
Scikit-Learn
OneHotEncoder
def one_hot_encode(data):
"""
Return the one-hot encoded dataframe of our input data.
Parameters
-----------
data: a dataframe that may include non-numerical features
Returns
-----------
A one-hot encoded dataframe that only contains numeric features
"""
enc = OneHotEncoder(drop='first') # remove redundancy by categories
# Eg:
# var1_apple = 1 - (var1_orange + var1_pear + var1_banana)
# var2_cat = 1 - (var2_dog + var2_fish)
enc.fit(data.select_dtypes(include='category'))
ohData = enc.transform(data.select_dtypes(include='category')).toarray()
ohResult = pd.DataFrame(ohData,columns= enc.get_feature_names_out())
# sex_Female sex_Male smoker_No smoker_Yes day_Fri day_Sat
return pd.concat([data.select_dtypes(exclude='category'),ohResult],axis =1)
Linear Regression
fit_intercept: bias as (False) or not(True)
from sklearn.linear_model import LinearRegression
sklearn_model = LinearRegression(fit_intercept=False).fit(one_hot_X,tips)
print("sklearn with bias column thetas:")
print(sklearn_model.coef_)
# sklearn with bias column thetas:
# [ 0.25496633 0.09448701 0.175992 0.14370363 0.11126269 0.17068732
# 0.084279 0.14104114 0.01958276 0.11556048 -0.02121806 0.09341886
# 0.16154746]
Pytorch
- torch.Tensor() is an alias for the default tensor type (torch.FloatTensor).
- torch.tensor(data, dtype=None, device=None, requires_grad=False)
Data type | dtype | Tensor type |
---|---|---|
32-bit floating point | torch.float32 or torch.float | torch.*.FloatTensor |
64-bit floating point | torch.float64 or torch.double | torch.*.DoubleTensor |
16-bit floating point | torch.float16 or torch.half | torch.*.HalfTensor |
8-bit integer (unsigned) | torch.uint8 | torch.*.ByteTensor |
8-bit integer (signed) | torch.int8 | torch.*.CharTensor |
16-bit integer (signed) | torch.int16 or torch.short | torch.*.ShortTensor |
32-bit integer (signed) | torch.int32 or torch.int | torch.*.IntTensor |
64-bit integer (signed) | torch.int64 or torch.long | torch.*.LongTensor |
Basic Operation
tensor shape is determined by the iteration of square brackets
torch.squeeze(): Returns a tensor with all specified dimensions of input of size 1 removed. If specified, a squeeze operation is done only in the given dimension.
a = torch.tensor(1) # torch.Size([])
a = a.unsqueeze(dim = 0) # torch.Size([1,1])
## squeeze and unsqueeze
a = torch.tensor([1,2]).unsqueeze(dim = 0) # torch.Size([1,2])
a = a.squeeze().unsqueeze(dim = 1) # torch.Size([2,1])
- Tensor.tolist(): array
- Tensor.item(): scalar