🕹️ Reinforcement Learning with OpenAI Gym
In this tutorial, we explore Reinforcement Learning (RL) using Q-learning with OpenAI Gym. We'll train an agent to play the FrozenLake game using Python.
📦 Prerequisites
- Basic Python knowledge
- Install required libraries:
pip install gym numpy
- Conceptual knowledge of RL (states, actions, rewards)
🎯 Step-by-Step Q-Learning with FrozenLake
Step 1: Import Libraries
import gym
import numpy as np
import random
Step 2: Initialize Environment
env = gym.make("FrozenLake-v1", is_slippery=False)
state_size = env.observation_space.n
action_size = env.action_space.n
q_table = np.zeros((state_size, action_size))
Step 3: Define Hyperparameters
total_episodes = 10000
learning_rate = 0.8
max_steps = 100
gamma = 0.95 # Discount rate
epsilon = 1.0
max_epsilon = 1.0
min_epsilon = 0.01
decay_rate = 0.005
Step 4: Train the Agent
for episode in range(total_episodes):
state = env.reset()[0]
done = False
for step in range(max_steps):
exp_exp_tradeoff = random.uniform(0,1)
if exp_exp_tradeoff > epsilon:
action = np.argmax(q_table[state,:])
else:
action = env.action_space.sample()
new_state, reward, done, truncated, _ = env.step(action)
q_table[state, action] = q_table[state, action] + learning_rate * (reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action])
state = new_state
if done:
break
epsilon = min_epsilon + (max_epsilon - min_epsilon) * np.exp(-decay_rate * episode)
Step 5: Evaluate the Agent
total_rewards = 0
for episode in range(100):
state = env.reset()[0]
done = False
for step in range(max_steps):
action = np.argmax(q_table[state,:])
new_state, reward, done, truncated, _ = env.step(action)
total_rewards += reward
state = new_state
if done:
break
print("Average reward:", total_rewards / 100)
📌 Tips
- Use slippery=True for harder environments
- Try other environments like
CartPole-v1
andMountainCar-v0
- Consider Deep Q-Learning with Neural Networks for large state spaces
📱 Mobile & SEO Ready
This HTML is fully responsive and optimized for mobile Blogspot readers with SEO meta tags, clean structure, and accessibility-friendly formatting.
🎯 Summary
Reinforcement Learning allows an agent to learn optimal behavior through reward signals. Using OpenAI Gym makes experimentation easy and visual.
🔜 Coming Soon:
Build Your Own Neural Network from Scratch in Python (No Libraries!)
🕹️ Reinforcement Learning with OpenAI Gym
In this tutorial, we explore Reinforcement Learning (RL) using Q-learning with OpenAI Gym. We'll train an agent to play the FrozenLake game using Python.
📦 Prerequisites
- Basic Python knowledge
- Install required libraries:
pip install gym numpy
- Conceptual knowledge of RL (states, actions, rewards)
🎯 Step-by-Step Q-Learning with FrozenLake
Step 1: Import Libraries
import gym
import numpy as np
import random
Step 2: Initialize Environment
env = gym.make("FrozenLake-v1", is_slippery=False)
state_size = env.observation_space.n
action_size = env.action_space.n
q_table = np.zeros((state_size, action_size))
Step 3: Define Hyperparameters
total_episodes = 10000
learning_rate = 0.8
max_steps = 100
gamma = 0.95 # Discount rate
epsilon = 1.0
max_epsilon = 1.0
min_epsilon = 0.01
decay_rate = 0.005
Step 4: Train the Agent
for episode in range(total_episodes):
state = env.reset()[0]
done = False
for step in range(max_steps):
exp_exp_tradeoff = random.uniform(0,1)
if exp_exp_tradeoff > epsilon:
action = np.argmax(q_table[state,:])
else:
action = env.action_space.sample()
new_state, reward, done, truncated, _ = env.step(action)
q_table[state, action] = q_table[state, action] + learning_rate * (reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action])
state = new_state
if done:
break
epsilon = min_epsilon + (max_epsilon - min_epsilon) * np.exp(-decay_rate * episode)
Step 5: Evaluate the Agent
total_rewards = 0
for episode in range(100):
state = env.reset()[0]
done = False
for step in range(max_steps):
action = np.argmax(q_table[state,:])
new_state, reward, done, truncated, _ = env.step(action)
total_rewards += reward
state = new_state
if done:
break
print("Average reward:", total_rewards / 100)
📌 Tips
- Use slippery=True for harder environments
- Try other environments like
CartPole-v1
andMountainCar-v0
- Consider Deep Q-Learning with Neural Networks for large state spaces
📱 Mobile & SEO Ready
This HTML is fully responsive and optimized for mobile Blogspot readers with SEO meta tags, clean structure, and accessibility-friendly formatting.
🎯 Summary
Reinforcement Learning allows an agent to learn optimal behavior through reward signals. Using OpenAI Gym makes experimentation easy and visual.
🔜 Coming Soon:
Build Your Own Neural Network from Scratch in Python (No Libraries!)
Comments
Post a Comment