Universal Robot Environment for Gymnasium and ROS Gazebo Interface based on: openai_ros, ur_openai_gym, rg2_simulation, and gazeboo_grasp_fix_plugin
https://github.com/ammar-n-abbas/sim2real-ur-gym-gazebo/assets/28593585/f85a4985-b9f3-4c21-9f49-ce0a86b853ed
Simulation Testing & Real-World Validation
   
  
  Sim2Real zero-shot transfer of gasping policy using safe-DRL
   
   
  
  Sim2Real zero-shot transfer of grasping policy with obstacle collision avoidance policy using safe-DRL
   
 
  
  Object or arm Collision avoidance for human-robot collaboration
Using Deep reinforcement Learning for a robotics case study with the main motivation of moving from sim2real with safety-critical systems.
@inproceedings{Abbas2024,
  author    = {Abbas, Ammar N and Mehak, Shakra and Chasparis, Georgios C and Kelleher, John D and Guilfoyle, Michael and Leva, Maria Chiara and Ramasubramanian, Aswin K},
  title     = {Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real Approach},
  booktitle = {2024 IEEE/IFAC International Conference on Control, Decision and Information Technologies},
  year      = {2024}
}
This is an example of how to list things you need to use the software and how to install them.
sudo apt-get install ros-noetic-desktop-full
git clone https://github.com/ammar-n-abbas/sim2real-ur-gym-gazebo.git
catkin build
rosdep install --from-paths src --ignore-src -r -y
The “state” refers to the current situation or configuration of the environment that the agent is interacting with. It includes relevant information as a set of variables or features necessary for the agent to make decisions and take action. In the UR5Env environment, the state is represented as a concatenation of various components that provide information about the current state of the robotic system. The state includes:
The “action” represents the decisions or moves an agent can take within a given environment. Actions are the choices available to the agent at any given state, and the goal of the reinforcement learning algorithm is to learn a policy that maps states to optimal actions. The action in the UR5Env environment represents the desired continuous changes to the end effector position (XYZ) and gripper control (G). The 4-dimensional action vector corresponds to:
The objective of the reward function is to shape the agent’s behavior to achieve desired outcomes during robotic tasks while maintaining safety standards. The reward function incorporates multiple factors, capturing aspects such as goal attainment, grasping quality, speed violations, velocity during collision violations, and collisions. It is designed to incentivize behaviors that lead to successful goal completion, and safe operation. Distinct penalties are imposed for undesired events, such as collisions or excessive speeds, while rewards are given for executing proper grasps. The modified reward function is expressed mathematically as shown in the equation below:
\[\text{reward} = - d + g - s_c - c_c - c_{c_c} - g_c - c_v - b_{c_c} -ik_c\]Where,
roslaunch ur_gazebo ur5_with_gripper_bringup.launch
This YAML file (ur_gym.yaml) contains configuration parameters for the UR Gym environment.
env_id: Specifies the Gymnasium environment ID.driver: Indicates the driver used, such as “gazebo”.reset_robot: Determines whether to reset the robot.ft_sensor: Specifies whether to use force/torque sensors.agent_control_dt: Time step for controlling the agent.reset_time: Time allocated for resetting the environment.rand_seed: Seed for random number generation.random_cube_pose: Determines if the initial cube pose is randomized.random_target_pose: Determines if the initial target pose is randomized.random_initial_pose: Determines if the initial robot pose is randomized.cl_target_pose: Specifies if the target pose is controlled.dist_cl: Distance for controlling the target pose.rand_init_interval: Interval for random initialization.workspace: Defines the workspace bounds.init_q: Initial joint configuration.cube_size: Size of the cube.object_disturbance: Indicates whether object disturbance is enabled.random_object_disturbance: Specifies if random object disturbance is enabled.hrc_disturbance: Indicates whether human-robot collaboration disturbance is enabled.random_hrc_disturbance: Specifies if random human-robot collaboration disturbance is enabled.sil_validation: Specifies whether to perform SIL (Software-in-the-loop) validation.goal_a: Position of goal A.goal_b: Position of goal B.n_actions: Number of actions.distance_threshold: Threshold for successful distance.proper_grasp_threshold: Threshold for proper grasp.vel_thresh: Velocity threshold.force_thresh: Force threshold.reward_type: Type of reward.speed_cost: Cost for speed.ik_cost: Cost for inverse kinematics.collision_cost: Cost for collision.coll_vel_cost: Cost for collision velocity.cube_collision_cost: Cost for cube collision.cube_coll_vel_cost: Cost for cube collision velocity.bar_dist_collision_cost: Cost for bar distance collision.gripper_cost: Cost for gripper.grip_rew: Reward for gripping.grip_prop_rew: Proportional reward for gripping.place_rew: Reward for placing.rew_scale_factor: Scaling factor for rewards.steps_per_episode: Number of steps per episode for RL training.This script (start_training_tqc.py), taken from SamsungLabs/tqc_pytorch, allows you to train an agent using the TQC algorithm in various OpenAI Gym environments.
rosrun start_training_tqc.py [--env ENV] [--eval_freq EVAL_FREQ] [--max_timesteps MAX_TIMESTEPS]
                             [--seed SEED] [--n_quantiles N_QUANTILES] [--top_quantiles_to_drop_per_net TOP_QUANTILES_TO_DROP_PER_NET]
                             [--n_nets N_NETS] [--batch_size BATCH_SIZE] [--discount DISCOUNT] [--tau TAU] [--log_dir LOG_DIR]
                             [--prefix PREFIX] [--save_model]
--env: (Default: “UR5PickandPlaceEnv-v0”) The name of the Gymnasium environment.--eval_freq: (Default: 5000) Frequency (in time steps) at which to evaluate the agent’s performance.--max_timesteps: (Default: 10000000000) Maximum number of time steps to run the environment.--seed: (Default: 0) Seed for random number generation.--n_quantiles: (Default: 25) Number of quantile samples to draw for each state-action pair.--top_quantiles_to_drop_per_net: (Default: 2) Number of top quantiles to drop per network.--n_nets: (Default: 5) Number of critic networks to use.--batch_size: (Default: 256) Batch size for both the actor and critic networks.--discount: (Default: 0.99) Discount factor for future rewards.--tau: (Default: 0.005) Rate at which to update target networks.--log_dir: (Default: “”) Directory to save logs and trained models.--prefix: (Default: “”) Prefix to add to log and model filenames.--save_model: (Default: True) Flag to save trained model and optimizer parameters.rosrun start_training_tqc.py --env UR5PickandPlaceEnv-v0 --eval_freq 10000 --max_timesteps 2000000 --seed 42 --save_model
This command runs the training script using the “UR5PickandPlaceEnv-v0” environment, evaluating the agent’s performance every 10,000 time steps, training for a maximum of 2,000,000 time steps, with a random seed of 42, and saving the trained model and optimizer parameters.
Contributions are what makes the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag “enhancement”. Don’t forget to give the project a star! Thanks again!
git checkout -b feature/AmazingFeature)git commit -m 'Add some AmazingFeature')git push origin feature/AmazingFeature)Distributed under the MIT License. See LICENSE.txt for more information.
Ammar Abbas - ammar.abbas@eu4m.eu
This project is part of the research activities done along the Collaborative Intelligence for Safety-Critical systems (CISC) project that has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement no. 955901. (cicproject.eu)