Transformers for NLP with Hugging Face

Transformers for NLP with Hugging Face

Natural Language Processing with Transformers (Hugging Face Tutorial)

Transformers have revolutionized NLP tasks like translation, summarization, and sentiment analysis. In this tutorial, we’ll use Hugging Face Transformers to fine-tune a BERT model for sentiment classification.

🧠 What is a Transformer?

Transformers use attention mechanisms to model relationships between all words in a sentence at once. Unlike RNNs or LSTMs, they do not require sequential data flow.

🔧 Prerequisites

  • Python 3.7+
  • pip install: transformers datasets torch
  • Basic understanding of NLP

🔨 Step-by-Step: Fine-Tune BERT for Sentiment Analysis

Step 1: Install Dependencies

!pip install transformers datasets torch

Step 2: Load Dataset and Model

from datasets import load_dataset
from transformers import BertTokenizer, BertForSequenceClassification

dataset = load_dataset("imdb")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

Step 3: Tokenize the Data

def tokenize_function(example):
  return tokenizer(example["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

Step 4: Create DataLoaders

from torch.utils.data import DataLoader
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
train_dataloader = DataLoader(tokenized_datasets["train"], batch_size=8, shuffle=True, collate_fn=data_collator)
eval_dataloader = DataLoader(tokenized_datasets["test"], batch_size=8, collate_fn=data_collator)

Step 5: Set Up Optimizer and Trainer

from transformers import AdamW, TrainingArguments, Trainer

training_args = TrainingArguments(output_dir="./results", evaluation_strategy="epoch", per_device_train_batch_size=8)
trainer = Trainer(model=model, args=training_args, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["test"], tokenizer=tokenizer)

Step 6: Train the Model

trainer.train()

✅ Evaluate the Model

results = trainer.evaluate()
print(results)

📌 Tips

  • Use a GPU for faster training: Google Colab works well
  • Start with smaller datasets to prototype
  • Use DistilBERT for faster inference

📱 Mobile Optimization Notice

This tutorial is structured with responsive text and scrollable code blocks, ideal for Blogspot mobile viewing. It’s also SEO-optimized with metadata tags for discoverability.

🎯 Conclusion

Transformers like BERT allow you to achieve state-of-the-art NLP performance with minimal effort. Try modifying this code for your own datasets or tasks like spam detection or topic classification.

🔜 Next Up

Coming soon: Reinforcement Learning with OpenAI Gym — stay tuned!

Comments