๐งน Data Preprocessing for Machine Learning
Before you teach your model, you must prepare your data! Let's dive in! ๐
Step | Meaning | Example |
---|---|---|
Handling Missing Data ๐ | Fill or remove missing values in the dataset. | Replace missing age with average age. |
Encoding Categorical Data ๐ค | Convert text labels into numbers. | "Male" ➔ 1, "Female" ➔ 0 |
Feature Scaling ๐ | Standardize or normalize data ranges. | Bring ages from 0-100 into 0-1 scale. |
Splitting Dataset ✂️ | Divide data into training and testing parts. | 80% for training, 20% for testing. |
๐ Handling Missing Data
- Use techniques like mean imputation or deletion.
- Machine learning models cannot handle blanks!
Example:
๐น Fill missing salary with the average salary of the group.
๐ค Encoding Categorical Data
- Algorithms prefer numbers, not text.
- Label Encoding or One-Hot Encoding techniques are used.
Example:
๐น Turn "Yes" and "No" into 1 and 0.
๐ Feature Scaling
- Features with large values can dominate others.
- Scaling keeps everything balanced.
Techniques:
๐น Min-Max Scaling
๐น Standardization (Z-score)
✂️ Splitting Dataset
- Train your model on one part, test it on another.
- Prevents "overfitting" (memorizing instead of learning).
Tip:
๐น Typical split is 80% train, 20% test.
๐ฏ Quick Challenge!
Why do we scale features?
By Darchums Technologies Inc - April 28, 2025
Comments
Post a Comment