Data Science Seminar: The Interface of Reinforcement Learning and Planning11 Jun 2021, by Sponsored events in
17 December 2020 (Online)
Aviv Tamar, Assistant Professor at Technion, Israel Institute for Technology, Israel
In reinforcement learning (RL), an agent learns to solve sequential decision making problems through trial and error. When combined with deep networks, RL has the potential to learn domain representations that are important for making good decisions, such as patterns in board games (Go, Chess), or image features in video games. Based on this observation, there is recent interest in applying RL to more general planning problems, such as combinatorial optimization, autonomous driving, or protein folding, where the key idea is that when encountering many similar problems, some structure of the solution could be learned by RL to speed up the planning computation. In this talk, I will discuss the algorithmic interface between planning and RL, focusing on both the representation architecture and the learning algorithms.
We start by asking about the capability of deep networks to learn planning computations, and introduce value iteration networks, a type of differentiable planner that can be used within model-free RL to obtain better generalization. Next, we consider a practical robotic assembly problem, and show that motion planning, based on readily available CAD data, can be combined with RL to quickly learn policies for assembling tight fitting objects. I will then question the suitability of the conventional RL formulation for learning to plan. Most planning problems are goal based in nature, and a planner should provide a solution for every possible goal. Standard RL, on the other hand, is based on a single goal formulation (the reward function), and making RL work in a multi-goal setting is challenging. In recent work we developed sub-goal trees (SGTs) — a new RL formulation that is developed from a different first principle – the all-pairs shortest path problem on graphs. We show that for multi-goal problems, SGTs are provably better at handling approximation errors than conventional RL (O(NlogN) vs. O(N^2)), and we propose several novel RL algorithms based on the SGT formalism. We demonstrate results on learning motion planning for a 7 DoF robot.