Learning Hierarchical Policies from Unsegmented Demonstrations using Causal Information

The use of imitation learning to learn a single policy for a complex task that has multiple modes or hierarchical structure can be challenging. In fact, previous work has shown that learning separate policies for each mode or sub-task can greatly improve the performance of imitation learning. In this work, we model the interaction between sub-tasks and their resulting state-action trajectory sequences as a directed graphical model. We propose a new algorithm based on the generative adversarial imitation learning framework which learns sub-task policies from unsegmented demonstrations. Our approach maximizes the causal information flow in the graphical model between sub-task latent variables and their generated trajectories. We also show how our approach connects with existing 'Options' framework commonly used to learn hierarchical policies.

PDF