Abstract: Interactive grasping from clutter, akin to human dexterity, is one of the longest-standing problems in robot learning.
Challenges stem from the intricacies of visual perception, the demand of precise motor skills, and the complex interplay between the two.
In this work, we present Teacher-Augmented Policy Gradient (TAPG), a novel two-stage learning framework that synergizes reinforcement learning (RL) and policy distillation.
After training a privileged teacher policy to master the motor control, TAPG facilitates guided, yet adaptive, learning of a sensorimotor policy, enabling it to navigate the intricacies of new observation space.
We demostrate this ability by integrating TAPG with a promptable segmentation model.
Our trained policies adeptly grasp a wide variety of objects from cluttered scenarios in simulation and the real-world based on human-understandble prompts.
Furthermore, we show robust zero-shot transfer to novel objects.
Teacher policy on unseen objects
The privileged teacher policy is trained from simulator-state information about the objects, specifically their oriented bounding boxes.
While this leads to efficient learning of the task, the policy does not account for the perception pipeline used during deployment.
Real-world deployment of TAPG policy on unseen objects
TAPG adapts the behaviors of the teacher to improve the visibility of the object, leading to a real-world deployable policy.