sim2real-ur-gym-gazebo

Sim2Real Implementation: Gymnasium-Gazebo UREnv for Deep Reinforcement Learning With Reach, Grasp, and Pick&Place Environment with Collision Avoidance (Object or Human)

Universal Robot Environment for Gymnasium and ROS Gazebo Interface based on: openai_ros, ur_openai_gym, rg2_simulation, and gazeboo_grasp_fix_plugin

https://github.com/ammar-n-abbas/sim2real-ur-gym-gazebo/assets/28593585/f85a4985-b9f3-4c21-9f49-ce0a86b853ed

Simulation Testing & Real-World Validation

Pick and place policy visualization on Rviz Sim2Real zero-shot transfer of grasping policy using safe-DRL
Sim2Real zero-shot transfer of gasping policy using safe-DRL

Environment with obstacle avoidance Sim2Real zero-shot transfer of grasping policy with obstacle collision avoidance using safe-DRL top view Sim2Real zero-shot transfer of grasping policy with obstacle collision avoidance using safe-DRL front view
Sim2Real zero-shot transfer of grasping policy with obstacle collision avoidance policy using safe-DRL

Arm collision avoidance for human-robot collaboration top-view Arm collision avoidance for human-robot collaboration front-view
Object or arm Collision avoidance for human-robot collaboration

Star History

Star History Chart

Table of Contents

About The Project

Using Deep reinforcement Learning for a robotics case study with the main motivation of moving from sim2real with safety-critical systems.

BibTeX

@inproceedings{Abbas2024,
  author    = {Abbas, Ammar N and Mehak, Shakra and Chasparis, Georgios C and Kelleher, John D and Guilfoyle, Michael and Leva, Maria Chiara and Ramasubramanian, Aswin K},
  title     = {Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real Approach},
  booktitle = {2024 IEEE/IFAC International Conference on Control, Decision and Information Technologies},
  year      = {2024}
}

Getting Started

Prerequisites

This is an example of how to list things you need to use the software and how to install them.

sudo apt-get install ros-noetic-desktop-full

Installation

  1. Clone the repo
    git clone https://github.com/ammar-n-abbas/sim2real-ur-gym-gazebo.git
    
  2. Build package
    catkin build
    
  3. Install dependencies
    rosdep install --from-paths src --ignore-src -r -y
    

Task Space

State

The “state” refers to the current situation or configuration of the environment that the agent is interacting with. It includes relevant information as a set of variables or features necessary for the agent to make decisions and take action. In the UR5Env environment, the state is represented as a concatenation of various components that provide information about the current state of the robotic system. The state includes:

Action

The “action” represents the decisions or moves an agent can take within a given environment. Actions are the choices available to the agent at any given state, and the goal of the reinforcement learning algorithm is to learn a policy that maps states to optimal actions. The action in the UR5Env environment represents the desired continuous changes to the end effector position (XYZ) and gripper control (G). The 4-dimensional action vector corresponds to:

Reward

The objective of the reward function is to shape the agent’s behavior to achieve desired outcomes during robotic tasks while maintaining safety standards. The reward function incorporates multiple factors, capturing aspects such as goal attainment, grasping quality, speed violations, velocity during collision violations, and collisions. It is designed to incentivize behaviors that lead to successful goal completion, and safe operation. Distinct penalties are imposed for undesired events, such as collisions or excessive speeds, while rewards are given for executing proper grasps. The modified reward function is expressed mathematically as shown in the equation below:

\[\text{reward} = - d + g - s_c - c_c - c_{c_c} - g_c - c_v - b_{c_c} -ik_c\]

Where,

Usage

Launch Gazebo Simulation and Spawn the UR in the World

roslaunch ur_gazebo ur5_with_gripper_bringup.launch

UR Gym Configuration YAML File

This YAML file (ur_gym.yaml) contains configuration parameters for the UR Gym environment.

General Agent Parameters

Initial Conditions

Workspace and Initial Pose

Object Properties

Anomalies

Validations

Target

Actions Parameters

Success Parameters

Penalty Threshold

Reward Parameters

Reinforcement Learning (RL)

Training Script for TQC (Truncated Quantile Critic) Algorithm

This script (start_training_tqc.py), taken from SamsungLabs/tqc_pytorch, allows you to train an agent using the TQC algorithm in various OpenAI Gym environments.

rosrun start_training_tqc.py [--env ENV] [--eval_freq EVAL_FREQ] [--max_timesteps MAX_TIMESTEPS]
                             [--seed SEED] [--n_quantiles N_QUANTILES] [--top_quantiles_to_drop_per_net TOP_QUANTILES_TO_DROP_PER_NET]
                             [--n_nets N_NETS] [--batch_size BATCH_SIZE] [--discount DISCOUNT] [--tau TAU] [--log_dir LOG_DIR]
                             [--prefix PREFIX] [--save_model]

Arguments

Example

rosrun start_training_tqc.py --env UR5PickandPlaceEnv-v0 --eval_freq 10000 --max_timesteps 2000000 --seed 42 --save_model

This command runs the training script using the “UR5PickandPlaceEnv-v0” environment, evaluating the agent’s performance every 10,000 time steps, training for a maximum of 2,000,000 time steps, with a random seed of 42, and saving the trained model and optimizer parameters.

Roadmap

Contributing

Contributions are what makes the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag “enhancement”. Don’t forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Ammar Abbas - ammar.abbas@eu4m.eu

Acknowledgments

This project is part of the research activities done along the Collaborative Intelligence for Safety-Critical systems (CISC) project that has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement no. 955901. (cicproject.eu)