Reinforcement Learning with OpenAI GYM

Aadhil imam
4 min readMay 20, 2020

We are going exploring some of the deep learning puzzles on openAI gym as well as caught up an agent to play them . openAI gym is pretty much like a gym for testing our different reinforcement learning algorithms on various simulated environments with the overall goal of maximizing the reward from interacting with that environment.so lets see what is Reinforcement Learning

Reinforcement Learning

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience.

Here is a simple graph refers how agents works

An agent in a current state (St) takes an action (At) to which the environment reacts and responds, returning a new state(St+1) and reward (Rt+1) to the agent. Given the updated state and reward, the agent chooses the next action, and the loop repeats until an environment is solved or terminated.

OpenAI Gym is a open source toolkit for developing and comparing reinforcement learning algorithms which gives you access to a standardized set of environments.

so lets install OpenAI gym environment and get hands on practical one looking at the classic control environment.

pip install gym

Lets start make reinforcement learning algorithm openai gym classic CartPole-v1 environment.

If we want to design our own agent to play optimally in this environment we will need to understanding more about the interface of the environment.

Let’s check out the the Cart pool environment on the Github page so we see that the environment has an observation table which is essentially the current state of the environment and this specifies the possible values for the Carts position along the line its velocity and also the angle and the angular velocity of the pole.

Observation Table

Then we have an action table which tells us what actions correspond to each action index so zero is push the card left and one is push it right we can actually access all of this information from our end object so looking at the attributes of this object.

Action Table

Now let’s create an agent class that uses this information from the environment to make custom actions.

Then we need to create an agent object and call get action method to get an action for the environment and the observation table we cane see that the information about the pole angle is at index 2 of the observation space so that means we need to pass the current state to our agent. when choosing an action for the next time step so it turns out env.reset actually returns the initial state and then env.step returns a tuple containing the next state or reward for the last time step step whether or not the episode reached a terminal state.

Now you have an idea of how to set up a basic interface for interacting with an environment on OpenAI gym and understanding about reinforcement learning.

--

--