Rule Learner and Action Predictor

Unsupervised Learning of Neuro-symbolic Rules for Generalizable Context-aware Planning in Object Arrangement Tasks

Siddhant Sharma¹, Shreshth Tuli¹, Rohan Paul²

¹ Work primarily done while at IIT Delhi ² Affiliated with Computer Science and Engg. (CSE) and Yardi School of AI (ScAI), IIT Delhi

Abstract

As robots tackle complex object arrangement tasks, it becomes imperative for them to be able to generalize to complex worlds and scale with number of objects. This work postulates that extracting action primitives, such as push operations, their pre-conditions and effects would enable strong generalization to unseen worlds. Hence, we factorize policy learning as inference of such generic rules, which act as strong priors for predicting actions given the world state. Learnt rules act as propositional knowledge and enable robots to reach goals in a zero-shot method by applying the rules independently and incrementally. However, obtaining hand-engineered rules, such as PDDL descriptions is hard, especially for unseen worlds.

This work aims to learn generic, sparse, and context-aware rules that govern action primitives in robotic worlds through human demonstrations in simple domains. We demonstrate that our approach, namely RLAP, is able to extract rules without explicit supervision of rule labels and generate goal-reaching plans in complex Sokoban styled domains that scale with number of objects. RLAP furnishes significantly higher goal reaching rate and shorter planning times compared to the state-of-the-art techniques.

High-Level Approach

Demonstrations

Model is trained on simple demonstrations as shown.

Demonstration that shows one object being moved in the presence of another object without collision.

Demonstration that shows one object being moved in the absence of another object.

Plans

Model is then tested on much more complex planning tasks in more complex world instances. We can see from the below example that the generalizable potential of our technique is very high. We are able to achieve this by learning rules from demonstrations (RLAP) and then planning with these learnt rules (RB-MCTS). Both these techniques are described in deeper depth in the paper.

The goal state dictates rotating the order of the top right corner blocks in an anti-clockwise fashion, and arranging the others in a line as shown above.

Citation