The Realm of Supervised Learning

Preprocessing data using different techniques

Getting ready
1
2
3
4
import numpy as np
from sklearn import preprocessing

data = np.array([[3, -1.5, 2, -5.4], [0, 4, -0.3, 2.1], [1, 3.3, -1.9, -4.3]]) ~~~~
How to do it…
1
2
3
4
# Mean removal
data_standardized = preprocessing.scale(data)
print "\nMean =", data_standardized.mean(axis=0)
print "Std deviation =", data_standardized.std(axis=0)
  • python preprocessor.py

Mean = [ 5.55111512e-17 -1.11022302e-16 -7.40148683e-17 -7.40148683e-17]

Std deviation = [ 1. 1. 1. 1.]

1
2
3
4
# Scaling 
data_scaler = preprocessing.MinMaxScaler(feature_range=(0.1))
data_scaled = data_scaler.fit_transform(data)
print "\nMin max scaled data =", data_scaled
1
2
3
# Normalization
data_normalized = preprocessing.normalize(data, norm='l1')
print "\nL1 normalized data =", data_normalized
1
2
3
# Binarization
data_binarized = preprocessing.Binarizer(threshold=1.4).transform(data)
print "\nBinarized data =", data_binarized

>

1
2
3
4
5
# One Hot Encoding
encoder = preprocessing.OneHotEncoder()
encoder.fit([[0, 2, 1, 12], [1, 3, 5, 3], [2, 3, 2, 12],[1, 2, 4, 3]])
encoded_vector = encoder.transform([[2, 3, 5, 3]]).toarray()
print "\nEncoded vector =", encoded_vector

Label encoding

How to do it…
1
2
3
4
5
6
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
input_classes = ['audi', 'ford', 'audi', 'toyota', 'ford', 'bmw']
label_encoder.fit(input_classes)
print "\nClass mapping:"
for i, item in enumerate(label_encoder.classes_): print item, '-->', i

>

1
labels =
Author

Canoespock

Posted on

2023-01-23

Updated on

2023-01-23

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.