DarchumsTech

Spam Classifier Tutorial

Build and deploy your own Spam Classifier using Machine Learning

🔹 Step 1: Collecting the Data

For this project, we need a dataset with messages labeled as Spam or Ham. Here’s a simple example:

import pandas as pd
# Sample Data
data = {'text': ["win a free prize now!", "congratulations, you won!", "hi, how are you?", "let's grab lunch tomorrow", "limited time offer, buy now!"],
        'label': ['spam', 'spam', 'ham', 'ham', 'spam']}
df = pd.DataFrame(data)

🔹 Step 2: Preprocessing the Data

We'll convert the text messages into numerical vectors using **TF-IDF Vectorizer**.

from sklearn.feature_extraction.text import TfidfVectorizer
# Initialize TF-IDF Vectorizer
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['text'])
y = df['label']

🔹 Step 3: Splitting Data into Training and Test Sets

Next, we’ll split the data into **training** and **test** sets to evaluate our model later.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

🔹 Step 4: Training the Model

We'll use the **Naive Bayes** classifier, which is simple and effective for text classification.

from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(X_train, y_train)

🔹 Step 5: Evaluating the Model

Now we will evaluate the model using accuracy and a classification report.

from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print(classification_report(y_test, y_pred))

🔹 Step 6: Saving the Model

To use the model later, we'll save it along with the vectorizer using **joblib**.

import joblib
joblib.dump(model, 'spam_model.pkl')
joblib.dump(vectorizer, 'tfidf_vectorizer.pkl')

🔹 Step 7: Deploying the Model with Flask

Let's create a **Flask API** to serve the model.

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

model = joblib.load('spam_model.pkl')
vectorizer = joblib.load('tfidf_vectorizer.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    message = data['message']
    message_vectorized = vectorizer.transform([message])
    prediction = model.predict(message_vectorized)
    return jsonify({'prediction': prediction[0]})

if __name__ == '__main__':
    app.run(debug=True)

🔹 Step 8: Front-End Interface (HTML + JavaScript)

Create a user-friendly interface where users can input their messages and get real-time predictions.

Test Your Own Message!

Type a message below and see if it's predicted as SPAM or HAM: