1. Scaling
Scaling means converting features from their natural range (for example, 100 – 900) down to a standard range (say -1 to 1 or 0 to 1). This is done during the pre-processing phase and mostly on features that vary very significantly. The case with such features is that the algorithm tends to treat higher values with a higher weight as compared to the lower ones. Then these high magnitude values start to bias the algorithm in a particular way as the algorithm assumes they are more important. This may not be the desired result.
So how is feature scaling beneficial for machine learning algorithms?
- Helps gradient descent converge quicker
- Helps to avoid the “NaN” trap where a number in a model becomes a “NaN” (i.e. Not a Number) value, commonly due to math operations.
- Helps the model to learn appropriate weights for the features without their magnitudes dominating the training.
When to apply feature scaling?
Feature scaling is essential for ML models that calculate the distances between data. It is also essential for models in which we desire faster convergence, like Neural Networks.
Some examples: K-Nearest Neighbours (KNN), K-Means, Principal Component Analysis (PCA), and Gradient descent.
Techniques for scaling
a. Z-Score:
It is a popular scaling tactic to calculate the Z score of each value. The Z-score relates the number of standard deviations away from the mean. scaled_value = (value – mean) / stddev
Using Z-scores will usually give you values between -3 and +3.
b. Min-Max Normalization:
This method re-scales a feature in the range 0 to 1 using the following formula
c. Standardization:
It is a very effective technique that re-scales a feature value so that it has a distribution with 0 mean value and variance equals 1.
Leave a comment