Abstract: Grasping objects of different shapes and sizes - a foundational, effortless skill for humans - remains a challenging task in robotics. Although model-based approaches can predict stable grasp configurations for known object models, they struggle to generalize to novel objects and often operate in a non-interactive open-loop manner. In this work, we present a reinforcement learning framework that learns the interactive grasping of various geometrically distinct real-world objects by continuously controlling an anthropomorphic robotic hand. We explore several explicit representations of object geometry as input to the policy. Moreover, we propose to inform the policy implicitly through signed distances and show that this is naturally suited to guide the search through a shaped reward component. Finally, we demonstrate that the proposed framework is able to learn even in more challenging conditions, such as targeted grasping from a cluttered bin. Necessary pre-grasping behaviors such as object reorientation and utilization of environmental constraints emerge in this case.
Explicit Representations of Object Geometry
Videos show the explicit object representations the policy operates on during manipulation. The bounding box is represented via its center pose and extent. Superquadrics are defined by their pose, 3 size, and 2 shape parameters. The points visualized in the videos lie on the recovered superquadric. We reduce the framerate of the videos to make it easier to follow the agent's behavior, since the policy is optimized to solve the tasks as fast as possible.
Center of mass
The rollouts below show a trained policy that picks a target object from clutter and highlight the main challenges of object interaction, strong variation in object size, and difficult accessibility of some positions in the deep bin.
Pick red cup under object interaction
Pick large pitcher
Continuous pre-grasp manipulation to pick tomato soup can