Spam Classifier Tutorial
Build and deploy your own Spam Classifier using Machine Learning
🔹 Step 1: Collecting the Data
For this project, we need a dataset with messages labeled as Spam or Ham. Here’s a simple example:
import pandas as pd # Sample Data data = {'text': ["win a free prize now!", "congratulations, you won!", "hi, how are you?", "let's grab lunch tomorrow", "limited time offer, buy now!"], 'label': ['spam', 'spam', 'ham', 'ham', 'spam']} df = pd.DataFrame(data)
🔹 Step 2: Preprocessing the Data
We'll convert the text messages into numerical vectors using **TF-IDF Vectorizer**.
from sklearn.feature_extraction.text import TfidfVectorizer # Initialize TF-IDF Vectorizer vectorizer = TfidfVectorizer(stop_words='english') X = vectorizer.fit_transform(df['text']) y = df['label']
🔹 Step 3: Splitting Data into Training and Test Sets
Next, we’ll split the data into **training** and **test** sets to evaluate our model later.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
🔹 Step 4: Training the Model
We'll use the **Naive Bayes** classifier, which is simple and effective for text classification.
from sklearn.naive_bayes import MultinomialNB model = MultinomialNB() model.fit(X_train, y_train)
🔹 Step 5: Evaluating the Model
Now we will evaluate the model using accuracy and a classification report.
from sklearn.metrics import accuracy_score, classification_report y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}") print(classification_report(y_test, y_pred))
🔹 Step 6: Saving the Model
To use the model later, we'll save it along with the vectorizer using **joblib**.
import joblib joblib.dump(model, 'spam_model.pkl') joblib.dump(vectorizer, 'tfidf_vectorizer.pkl')
🔹 Step 7: Deploying the Model with Flask
Let's create a **Flask API** to serve the model.
from flask import Flask, request, jsonify import joblib app = Flask(__name__) model = joblib.load('spam_model.pkl') vectorizer = joblib.load('tfidf_vectorizer.pkl') @app.route('/predict', methods=['POST']) def predict(): data = request.json message = data['message'] message_vectorized = vectorizer.transform([message]) prediction = model.predict(message_vectorized) return jsonify({'prediction': prediction[0]}) if __name__ == '__main__': app.run(debug=True)
🔹 Step 8: Front-End Interface (HTML + JavaScript)
Create a user-friendly interface where users can input their messages and get real-time predictions.
Test Your Own Message!
Type a message below and see if it's predicted as SPAM or HAM:
Comments
Post a Comment