[ROS Projects] OpenAI with Moving Cube Robot in Gazebo Step-by-Step Part1

In this new ROS Project you are going to learn Step-by-Step how to create a moving cube and that it learns to move using OpenAI environment.


Part 1

This first video is for learning the creation of the URDF and control systems.

Moving Cube Git: https://bitbucket.org/theconstructcore/moving_cube/src/master/

Step 1. Create a project in ROS Development Studio(ROSDS)

We’ll use ROSDS through this project in order to avoid setting up the environment, manage packages and etc. You can create a free account here if you haven’t had an account yet.

Step 2. Create package

Since this is a simulation, let’s create a package called my_moving_cube_description under the simulation_ws.

cd ~/simulation_ws/src
catkin_create_pkg my_moving_cube_description rospy

We’ll start by building the URDF description of the robot. To do that, we’ll create a new folder called urdf under the my_moving_cube_description directory and create a file called my_moving_cube.urdf inside it with the following initial content. The robot tag indicates the name of the robot – my_moving_cube.

<robot name="my_moving_cube">

Then let’s create the first link inside the robot. This includes 3 parts :

  1. inertial: It defines the physical property of the link. You can calculate the inertia of an object by using this tool: rosrun spawn_robot_tools_pkg inertial_calculator.py
  2. collision: It defines the collision property when the object interacts with other objects in the simulation.
  3. visual: It defines the visual property, how the object will visually show in the simulation.
    <link name="cube_body">
            <origin xyz="0 0 0" rpy="0 0 0"/>
            <mass value="0.5" />
            <inertia ixx="0.00333333333333" ixy="0.0" ixz="0.0" iyy="0.00333333333333" iyz="0.0" izz="0.00333333333333"/>
            <origin xyz="0 0 0" rpy="0 0 0"/>
                <box size="0.2 0.2 0.2"/>
	        <box size="0.2 0.2 0.2"/>

You also need to define the material property after the link if you want to use it in gazebo.  (NOTICE: the reference of the material property should have the same name as the link)

    <gazebo reference="cube_body">

Then we can create a spawn_moving_cube.launch file under the my_moving_cube_description/launch directory with the following content to spawn the cube.

<?xml version="1.0" encoding="UTF-8"?>
    <include file="$(find spawn_robot_tools_pkg)/launch/spawn_robot_urdf.launch">
        <arg name="x" default="0.0" />
        <arg name="y" default="0.0" />
        <arg name="z" default="0.11" />
        <arg name="roll" default="0"/>
        <arg name="pitch" default="0"/>
        <arg name="yaw" default="0.0" />
        <arg name="urdf_robot_file" default="$(find my_moving_cube_description)/urdf/my_moving_cube.urdf" />
        <arg name="robot_name" default="my_moving_cube" />

Now, go to simulations->–Empty– to launch an empty world. Then go to Tools->shell to run the command

roslaunch my_moving_cube_description spawn_moving_cube.launch

You should see the cube now appears in the empty world like this

Similarly, we can add another link called inertia_whell_roll

    <link name="inertia_wheel_roll">
            <origin xyz="0 0 0" rpy="0 0 0"/>
            <mass value="0.5" />
            <inertia ixx="0.000804166666667" ixy="0.0" ixz="0.0" iyy="0.000804166666667" iyz="0.0" izz="0.0016"/>
            <origin xyz="0 0 0" rpy="0 0 0"/>
                <cylinder radius="0.08" length="0.01"/>
	        <cylinder radius="0.08" length="0.01"/>
    <gazebo reference="inertia_wheel_roll">

Then we also need to define the joint type which connects these two links

    <joint name="inertia_wheel_roll_joint" type="continuous">
        <origin xyz="0.1 0.0 0.0" rpy="0 1.57 0"/>
        <parent link="cube_body"/>
        <child  link="inertia_wheel_roll"/>
        <limit effort="200" velocity="1000.0"/>
        <axis xyz="0 0 1"/>

If you launch it again, you should see the red cylinder appear.

Step 3. Make the robot move

We need to include the controller package first in the urdf

        <plugin name="gazebo_ros_control" filename="libgazebo_ros_control.so">

We’ll add the transmission part to actuate the robot.(NOTICE: the joint_name should be the same as the joint)

    <transmission name="inertia_wheel_roll_joint_trans">
      <joint name="inertia_wheel_roll_joint">
      <actuator name="inertia_wheel_roll_jointMotor">

Then we create a moving_cube.yaml file under the my_moving_cube_description/config to define parameters for the controller

# .yaml config file
# The PID gains and controller settings must be saved in a yaml file that gets loaded
# to the param server via the roslaunch file (moving_cube_control.launch).

  # Publish all joint states -----------------------------------
  # Creates the /joint_states topic necessary in ROS
    type: joint_state_controller/JointStateController
    publish_rate: 30

  # Effort Controllers ---------------------------------------
    type: effort_controllers/JointVelocityController
    joint: inertia_wheel_roll_joint
    pid: {p: 1.0, i: 0.0, d: 0.0}

In the end, you should create one new launch file called moving_cube control.launch under the launch folder to launch the controller

<?xml version="1.0" encoding="UTF-8"?>

  <rosparam file="$(find my_moving_cube_description)/config/moving_cube.yaml"

  <node name="robot_state_publisher_moving_cube" pkg="robot_state_publisher" type="robot_state_publisher"
        respawn="false" output="screen">
            <param name="publish_frequency" type="double" value="30.0" />
            <param name="ignore_timestamp" type="bool" value="true" />
            <param name="tf_prefix" type="string" value="moving_cube" />
            <remap from="/joint_states" to="/moving_cube/joint_states" />

  <node name="controller_spawner" pkg="controller_manager" type="spawner" respawn="false"
        output="screen" args="--namespace=/my_moving_cube


Let’s spawn the cube and launch the controller

roslaunch my_moving_cube_descriptiospawn_moving_cube.launch
roslaunch my_moving_cube_description moving_cube_control.launch

Then we can kick the robot and make it move by publishing one topic like this

rostopic pub /my_moving_cube/inertia_wheel_roll_joint_velocity_controller/command std_msgs/Float64 "data: 80.0"

Congratulations! Your cube robot moved a bit!


Edit by: Tony Huang


This is a series of posts. If you didn’t follow up, you can find the previous post here. In this 5th video of the robotic manipulator series, we will expand the ROS controllers to all joints of our robot using XACRO. At the end of the video we’ll have a full controlled robot through ROS topics. We are also going to use RQT Publisher and RQT Reconfigure to do some experiments with the robot.

Step 0. Create a project in ROS Development Studio(ROSDS)

ROSDS helps you follow our tutorial in a fast pace without dealing without setting up an environment locally. If you haven’t had an account yet, you can create a free account here. We’ll start the project for the previous video – manipulator_video_no4.

Step 1. Configure controller

In order to use controllers for our robot.  The first step is we need to add transmission definitions for each joint in the links_joints.xarco and add limitation for each joint in the mrm.xarco file. The most important thing is to add the gazebo plugin in the mrm.xarco file.

<xacro:macro name="m_joint" params="name type axis_xyz origin_rpy origin_xyz parent child limit_e limit_l limit_u limit_v">
    <joint name="${name}" type="${type}">
      <axis xyz="${axis_xyz}" />
      <limit effort="${limit_e}" lower="${limit_l}" upper="${limit_u}" velocity="${limit_v}" />
      <origin rpy="${origin_rpy}" xyz="${origin_xyz}" />
      <parent link="${parent}" />
      <child link="${child}" />
    <transmission name="trans_${name}">
      <joint name="${name}">
      <actuator name="motor_${name}">
<m_joint name="${link_01_name}__${link_02_name}" type="revolute"
           axis_xyz="0 1 0"
           origin_rpy="0 0 0" origin_xyz="0 0 0.4"
           parent="link_01" child="link_02"
           limit_e="1000" limit_l="0" limit_u="0.5" limit_v="0.5" />
  <m_link_cylinder name="${link_02_name}"
              origin_rpy="0 0 0" origin_xyz="0 0 0.4"
              ixx="12.679" ixy="0" ixz="0"
              iyy="12.679" iyz="0"
              radius="0.15" length="0.8" />
  <m_joint name="${link_02_name}__${link_03_name}" type="revolute"
           axis_xyz="0 1 0"
           origin_rpy="0 0 0" origin_xyz="0 0 0.8"
           parent="link_02" child="link_03"
           limit_e="1000" limit_l="0" limit_u="0.75" limit_v="0.5" />
  <m_link_cylinder name="${link_03_name}"
              origin_rpy="0 0 0" origin_xyz="0 0 0.4"
              ixx="12.679" ixy="0" ixz="0"
              iyy="12.679" iyz="0"
              radius="0.15" length="0.8" />
  <m_joint name="${link_03_name}__${link_04_name}" type="revolute"
           axis_xyz="0 1 0"
           origin_rpy="0 0 0" origin_xyz="0 0 0.8"
           parent="link_03" child="link_04"
           limit_e="1000" limit_l="0" limit_u="0.75" limit_v="0.5" />
  <m_link_cylinder name="${link_04_name}"
              origin_rpy="0 0 0" origin_xyz="0 0 0.4"
              ixx="12.679" ixy="0" ixz="0"
              iyy="12.679" iyz="0"
              radius="0.15" length="0.8" />
  <m_joint name="${link_04_name}__${link_05_name}" type="revolute"
           axis_xyz="0 0 1"
           origin_rpy="0 0 0" origin_xyz="0 0 0.8"
           parent="link_04" child="link_05"
           limit_e="1000" limit_l="-3.14" limit_u="3.14" limit_v="0.5" />

    <plugin name="gazebo_ros_control" filename="libgazebo_ros_control.so">

Then we create a file called joints.yaml file in the config folder which defines the parameters for controllers.

# Publish all joint states -----------------------------------
  type: joint_state_controller/JointStateController
  publish_rate: 50

# Position Controllers ---------------------------------------
  type: effort_controllers/JointPositionController
  joint: base_link__link_01
  pid: {p: 2000.0, i: 100, d: 500.0}
  type: effort_controllers/JointPositionController
  joint: link_01__link_02
  pid: {p: 50000.0, i: 100, d: 2000.0}
  type: effort_controllers/JointPositionController
  joint: link_02__link_03
  pid: {p: 20000.0, i: 50, d: 1000.0}
  type: effort_controllers/JointPositionController
  joint: link_03__link_04
  pid: {p: 2000.0, i: 50, d: 200.0}
  type: effort_controllers/JointPositionController
  joint: link_04__link_05
  pid: {p: 700.0, i: 50, d: 70.0}

This file defines the type of the controller and the pid parameters for each controller.

The last step is to modify the launch file in order to launch the gazebo plugin for the controller in the spawn.launch file.

<?xml version="1.0" encoding="UTF-8"?>
    <group ns="/mrm">
        <!-- Robot model -->
        <param name="robot_description" command="$(find xacro)/xacro --inorder '$(find mrm_description)/urdf/mrm.xacro'" />
        <arg name="x" default="0"/>
        <arg name="y" default="0"/>
        <arg name="z" default="0.5"/>
        <!-- Spawn the robot model -->
        <node name="mybot_spawn" pkg="gazebo_ros" type="spawn_model" output="screen"
              args="-urdf -param robot_description -model mrm -x $(arg x) -y $(arg y) -z $(arg z)" />
        <!-- Load controllers -->
        <rosparam command="load" file="$(find mrm_description)/config/joints.yaml" />
        <!-- Controllers -->
        <node name="controller_spawner" pkg="controller_manager" type="spawner"
            respawn="false" output="screen" ns="/mrm"
            --timeout 60">
        <!-- rqt -->
        <node name="rqt_reconfigure" pkg="rqt_reconfigure" type="rqt_reconfigure" />
        <node name="rqt_publisher" pkg="rqt_publisher" type="rqt_publisher" />

Then you can open an empty simulation from Simulations->Empty and launch it with the following command

roslaunch mrm_description spawn.launch

After you see the robot appear in the simulation, you can open the graphical tool from Tools->graphical tool

By using the rqt tool, you can send parameter to topics control the joint movement(e.g mrm/joint4_position_controller/command/dara) to move the robot.

The PID parameters are not well tuned yet, we’ll do it in the next post. But the robot is moving(with some oscillation) now.

Want to learn more?

If you want to learn more about how to create manipulator simulation in ROS, please check our Robot Creation with URDF ROS and ROS Control 101 course.


Edit by: Tony Huang



Did you like the video? If you did please give us a thumbs up and remember to subscribe to our channel and press the bell for a new video every day. Either you like it or not, please share your thoughts and questions in the comments area. See you!

We continue setting up OpenAI-Gym to make a Hopper robot learn in Gazebo simulator, using ROS Development Studio.

If you didn’t follow the previous post, here are the links:

Got a suggestion for the next steps to take of this project? We would love to hear them in the comments bellow :).

Part 2

Step 1. Environment setup

We’ll start by explaining how to create the gym environment for training. You can create a file called monoped_env.py under the my_hopper_training/src directory with the following content

This script creates the gym environment with the following parts:

1.Register the training environment

You will always need to do this step to inform the gym package that a new training environment is created.

2.Load the desired pose

Parameters related to the training is loaded at this step.

3.Connect with the gazebo

In order to connect the gym environment with gazebo simulation, we create a script called gazebo_connection.py in the same directory with the following content

4.Connect with the controller

Since TF doesn’t like that we reset the environment and it will generate some problems. We have to manually reset the controller with the controller_connection.py script. We won’t go into detail here. If you want to learn more about the controller, please check our controller 101 course.

5.Generate state object

The state object is then generated with the current state with the following monoped_state.py script

Setup the environment back to its initial observation. This includes the steps like pause the simulation, reset the controller and etc. as we described before.


In this part, an action will be decided based on the current state and then be published with the following joint_publisher.py script. After that, the reward is calculated based on the new observation.

#!/usr/bin/env python

import rospy
import math
from std_msgs.msg import String
from std_msgs.msg import Float64

class JointPub(object):
    def __init__(self):

        self.publishers_array = []
        self._haa_joint_pub = rospy.Publisher('/monoped/haa_joint_position_controller/command', Float64, queue_size=1)
        self._hfe_joint_pub = rospy.Publisher('/monoped/hfe_joint_position_controller/command', Float64, queue_size=1)
        self._kfe_joint_pub = rospy.Publisher('/monoped/kfe_joint_position_controller/command', Float64, queue_size=1)

        self.init_pos = [0.0,0.0,0.0]

    def set_init_pose(self):
        Sets joints to initial position [0,0,0]

    def check_publishers_connection(self):
        Checks that all the publishers are working
        rate = rospy.Rate(10)  # 10hz
        while (self._haa_joint_pub.get_num_connections() == 0):
            rospy.logdebug("No susbribers to _haa_joint_pub yet so we wait and try again")
            except rospy.ROSInterruptException:
                # This is to avoid error when world is rested, time when backwards.
        rospy.logdebug("_haa_joint_pub Publisher Connected")

        while (self._hfe_joint_pub.get_num_connections() == 0):
            rospy.logdebug("No susbribers to _hfe_joint_pub yet so we wait and try again")
            except rospy.ROSInterruptException:
                # This is to avoid error when world is rested, time when backwards.
        rospy.logdebug("_hfe_joint_pub Publisher Connected")

        while (self._kfe_joint_pub.get_num_connections() == 0):
            rospy.logdebug("No susbribers to _kfe_joint_pub yet so we wait and try again")
            except rospy.ROSInterruptException:
                # This is to avoid error when world is rested, time when backwards.
        rospy.logdebug("_kfe_joint_pub Publisher Connected")

        rospy.logdebug("All Publishers READY")

    def joint_mono_des_callback(self, msg):


    def move_joints(self, joints_array):

        i = 0
        for publisher_object in self.publishers_array:
          joint_value = Float64()
          joint_value.data = joints_array[i]
          i += 1

    def start_loop(self, rate_value = 2.0):
        rospy.logdebug("Start Loop")
        pos1 = [0.0,0.0,1.6]
        pos2 = [0.0,0.0,-1.6]
        position = "pos1"
        rate = rospy.Rate(rate_value)
        while not rospy.is_shutdown():
          if position == "pos1":
            position = "pos2"
            position = "pos1"

    def start_sinus_loop(self, rate_value = 2.0):
        rospy.logdebug("Start Loop")
        w = 0.0
        x = 2.0*math.sin(w)
        #pos_x = [0.0,0.0,x]
        #pos_x = [x, 0.0, 0.0]
        pos_x = [0.0, x, 0.0]
        rate = rospy.Rate(rate_value)
        while not rospy.is_shutdown():
            w += 0.05
            x = 2.0 * math.sin(w)
            #pos_x = [0.0, 0.0, x]
            #pos_x = [x, 0.0, 0.0]
            pos_x = [0.0, x, 0.0]

if __name__=="__main__":
    joint_publisher = JointPub()
    rate_value = 50.0

Step 2. Training

Now you have all the code you need to start training. Let’s run the simulation first. Go to Simulations -> Select launch file -> my_legged_robot_sims->main.launch to launch the hopper robot simulation in Gazebo.

Then you can run the training with

roslaunch my_hopper_training main.launch

The algorithm is working and start to train the robot to perform the task we want, however, it still requires lots of tuning to work properly.

You can have multiple instances and train robots with different parameters in parallel with our paid program in ROSDS. Please check it here if you are interested.


Edit by: Tony Huang



Here you will find all the code:

Or use directly the project of ROSDevelopementStudio:

Check Out this OpenAI course in RobotIgnite Academy for learning the basics step by step: https://wp.me/P9Rthq-1UZ


[ROS Projects] – My Robotic Manipulator – #Part 4 – ROS + URDF/Transmission + Gazebo Controllers

[ROS Projects] – My Robotic Manipulator – #Part 4 – ROS + URDF/Transmission + Gazebo Controllers

In this video we are going to set up in our robotic manipulator a new URDF element: Transmissions. After that, we are able to integrate ROS and gazebo simulation using ROS controllers. We’ll use a simplified model of the robot to see how it works. At the end of the video we’ll be able to send joint position commands to the robot publishing to some ROS topics.

Find a complete course about ROS controllers and URDF
Robot Ignite Academy:


ROS Development Studio:


Project repository:

URDF Transmission reference:

[ROS Projects] OpenAI with Hopper Robot in Gazebo Step-by-Step

[ROS Projects] OpenAI with Hopper Robot in Gazebo Step-by-Step

In this series, we are going to show you how to build a hopper robot in ROS and make it learn to hop using reinforcement learning algorithm. The hopper robot simulation has been built in the last post. In case you didn’t follow it, you can find the post here.

Part 1

Use OpenAI to make a Hopper robot learn in Gazebo simulator, using ROS Development Studio. We will use Qlearning and Gym for that.

Step 1. Create a training package

Let’s create a package for training

cd ~/simulation_ws/src/loco_motion
catkin_create_pkg my_hopper_training rospy

Then we create a launch file called main.launch inside the my_hopper_training/launch directory with the following content

    Date of creation: 5/II/2018
    Application created by: Miguel Angel Rodriguez <duckfrost@theconstructsim.com>
    The Construct https://www.theconstruct.ai
    License LGPLV3 << Basically means you can do whatever you want with this!


    <!-- Load the parameters for the algorithm -->
    <rosparam command="load" file="$(find my_hopper_training)/config/qlearn_params.yaml" />

    <!-- Launch the training system -->
    <node pkg="my_hopper_training" name="monoped_gym" type="start_training_v2.py" output="screen"/>

To implement reinforcement learning, we’ll use an algorithm called q-learn. We’ll save the parameters for the q-learn algorithm as qlearn_params.yaml under the my_hopper_training/config directory with the following content

# Algortihm Parameters
alpha: 0.1
gamma: 0.8
epsilon: 0.9
epsilon_discount: 0.999 # 1098 eps to reach 0.1
nepisodes: 100000
nsteps: 1000

# Environment Parameters
    x: 0.0
    y: 0.0
    z: 1.0
desired_force: 7.08 # In Newtons, normal contact force when stanting still with 9.81 gravity
desired_yaw: 0.0 # Desired yaw in radians for the hopper to stay
max_height: 3.0   # in meters
min_height: 0.5   # in meters
max_incl: 1.57       # in rads
running_step: 0.001   # in seconds
joint_increment_value: 0.05  # in radians
done_reward: -1000.0 # reward
alive_reward: 100.0 # reward

weight_r1: 1.0 # Weight for joint positions ( joints in the zero is perfect )
weight_r2: 0.0 # Weight for joint efforts ( no efforts is perfect )
weight_r3: 1.0 # Weight for contact force similar to desired ( weight of monoped )
weight_r4: 1.0 # Weight for orientation ( vertical is perfect )
weight_r5: 1.0 # Weight for distance from desired point ( on the point is perfect )

In this post, we’ll focus on explaining the training script. Let’s create it under the my_hopper_training_src directory and call it start_training_v2.py with the following content

#!/usr/bin/env python

    Original Training code made by Ricardo Tellez <rtellez@theconstructsim.com>
    Moded by Miguel Angel Rodriguez <duckfrost@theconstructsim.com>
    Visit our website at ec2-54-246-60-98.eu-west-1.compute.amazonaws.com
import gym
import time
import numpy
import random
import qlearn
from gym import wrappers
from std_msgs.msg import Float64
# ROS packages required
import rospy
import rospkg

# import our training environment
import monoped_env

if __name__ == '__main__':
    rospy.init_node('monoped_gym', anonymous=True, log_level=rospy.INFO)

    # Create the Gym environment
    env = gym.make('Monoped-v0')
    rospy.logdebug ( "Gym environment done")
    reward_pub = rospy.Publisher('/monoped/reward', Float64, queue_size=1)
    episode_reward_pub = rospy.Publisher('/monoped/episode_reward', Float64, queue_size=1)

    # Set the logging system
    rospack = rospkg.RosPack()
    pkg_path = rospack.get_path('my_hopper_training')
    outdir = pkg_path + '/training_results'
    env = wrappers.Monitor(env, outdir, force=True)
    rospy.logdebug("Monitor Wrapper started")
    last_time_steps = numpy.ndarray(0)

    # Loads parameters from the ROS param server
    # Parameters are stored in a yaml file inside the config directory
    # They are loaded at runtime by the launch file
    Alpha = rospy.get_param("/alpha")
    Epsilon = rospy.get_param("/epsilon")
    Gamma = rospy.get_param("/gamma")
    epsilon_discount = rospy.get_param("/epsilon_discount")
    nepisodes = rospy.get_param("/nepisodes")
    nsteps = rospy.get_param("/nsteps")

    # Initialises the algorithm that we are going to use for learning
    qlearn = qlearn.QLearn(actions=range(env.action_space.n),
                    alpha=Alpha, gamma=Gamma, epsilon=Epsilon)
    initial_epsilon = qlearn.epsilon

    start_time = time.time()
    highest_reward = 0
    # Starts the main training loop: the one about the episodes to do
    for x in range(nepisodes):
        rospy.loginfo ("STARTING Episode #"+str(x))
        cumulated_reward = 0
        cumulated_reward_msg = Float64()
        episode_reward_msg = Float64()
        done = False
        if qlearn.epsilon > 0.05:
            qlearn.epsilon *= epsilon_discount
        # Initialize the environment and get first state of the robot
        # Now We return directly the stringuified observations called state
        state = env.reset()

        # for each episode, we test the robot for nsteps
        for i in range(nsteps):

            # Pick an action based on the current state
            action = qlearn.chooseAction(state)
            # Execute the action in the environment and get feedback
            rospy.logdebug("###################### Start Step...["+str(i)+"]")
            rospy.logdebug("haa+,haa-,hfe+,hfe-,kfe+,kfe- >> [0,1,2,3,4,5]")
            rospy.logdebug("Action to Perform >> "+str(action))
            nextState, reward, done, info = env.step(action)
            rospy.logdebug("END Step...")
            rospy.logdebug("Reward ==> " + str(reward))
            cumulated_reward += reward
            if highest_reward < cumulated_reward:
                highest_reward = cumulated_reward

            rospy.logdebug("env.get_state...[distance_from_desired_point,base_roll,base_pitch,base_yaw,contact_force,joint_states_haa,joint_states_hfe,joint_states_kfe]==>" + str(nextState))

            # Make the algorithm learn based on the results
            qlearn.learn(state, action, reward, nextState)

            # We publish the cumulated reward
            cumulated_reward_msg.data = cumulated_reward

            if not(done):
                state = nextState
                rospy.logdebug ("DONE")
                last_time_steps = numpy.append(last_time_steps, [int(i + 1)])

            rospy.logdebug("###################### END Step...["+str(i)+"]")

        m, s = divmod(int(time.time() - start_time), 60)
        h, m = divmod(m, 60)
        episode_reward_msg.data = cumulated_reward
        rospy.loginfo( ("EP: "+str(x+1)+" - [alpha: "+str(round(qlearn.alpha,2))+" - gamma: "+str(round(qlearn.gamma,2))+" - epsilon: "+str(round(qlearn.epsilon,2))+"] - Reward: "+str(cumulated_reward)+"     Time: %d:%02d:%02d" % (h, m, s)))

    rospy.loginfo ( ("\n|"+str(nepisodes)+"|"+str(qlearn.alpha)+"|"+str(qlearn.gamma)+"|"+str(initial_epsilon)+"*"+str(epsilon_discount)+"|"+str(highest_reward)+"| PICTURE |"))

    l = last_time_steps.tolist()

    rospy.loginfo("Overall score: {:0.2f}".format(last_time_steps.mean()))
    rospy.loginfo("Best 100 score: {:0.2f}".format(reduce(lambda x, y: x + y, l[-100:]) / len(l[-100:])))


We won’t go into detail to explain the q-learn algorithm. You can find a tutorial here if you are interested. You can simply copy and paste the following code into a file called qlearn.py and put it under the my_hopper_training/src directory

Q-learning approach for different RL problems
as part of the basic series on reinforcement learning @

Inspired by https://gym.openai.com/evaluations/eval_kWknKOkPQ7izrixdhriurA
        @author: Victor Mayoral Vilches <victor@erlerobotics.com>

import random

class QLearn:
    def __init__(self, actions, epsilon, alpha, gamma):
        self.q = {}
        self.epsilon = epsilon  # exploration constant
        self.alpha = alpha      # discount constant
        self.gamma = gamma      # discount factor
        self.actions = actions

    def getQ(self, state, action):
        return self.q.get((state, action), 0.0)

    def learnQ(self, state, action, reward, value):
            Q(s, a) += alpha * (reward(s,a) + max(Q(s') - Q(s,a))            
        oldv = self.q.get((state, action), None)
        if oldv is None:
            self.q[(state, action)] = reward
            self.q[(state, action)] = oldv + self.alpha * (value - oldv)

    def chooseAction(self, state, return_q=False):
        q = [self.getQ(state, a) for a in self.actions]
        maxQ = max(q)

        if random.random() < self.epsilon:
            minQ = min(q); mag = max(abs(minQ), abs(maxQ))
            # add random values to all the actions, recalculate maxQ
            q = [q[i] + random.random() * mag - .5 * mag for i in range(len(self.actions))] 
            maxQ = max(q)

        count = q.count(maxQ)
        # In case there're several state-action max values 
        # we select a random one among them
        if count > 1:
            best = [i for i in range(len(self.actions)) if q[i] == maxQ]
            i = random.choice(best)
            i = q.index(maxQ)

        action = self.actions[i]        
        if return_q: # if they want it, give it!
            return action, q
        return action

    def learn(self, state1, action1, reward, state2):
        maxqnew = max([self.getQ(state2, a) for a in self.actions])
        self.learnQ(state1, action1, reward, reward + self.gamma*maxqnew)

In the training script, we are basically doing the following step:

  1. create the training environment
  2. read q learn parameters from the parameter server
  3. try to get the highest reward with the q learn algorithm by deciding which action to take based on the current state for each timestep

That’s it for today. For the next post, we are going to explain how to build the gym training environment.


Edit by: Tony Huang


Here you will find all the code:

Or use directly the project of ROSDevelopementStudio:

Check Out this OpenAI course in RobotIgnite Academy for learning the basics step by step:


