Robot Skill Learning: From Reinforcement Learning to Evolution Strategies - ENSTA Paris - École nationale supérieure de techniques avancées Paris Accéder directement au contenu
Article Dans Une Revue Paladyn: Journal of Behavioral Robotics Année : 2013

Robot Skill Learning: From Reinforcement Learning to Evolution Strategies

Résumé

Policy improvement methods seek to optimize the parameters of a policy with respect to a utility function. Owing to current trends involving searching in parameter space (rather than action space) and using reward-weighted averaging (rather than gradient estimation), reinforcement learning algorithms for policy improvement, e.g. PoWER and PI2, are now able to learn sophisticated high-dimensional robot skills. A side-effect of these trends has been that, over the last 15 years, reinforcement learning (RL) algorithms have become more and more similar to evolution strategies such as (μW , λ)-ES and CMA-ES. Evolution strategies treat policy improvement as a black-box optimization problem, and thus do not leverage the problem structure, whereas RL algorithms do. In this paper, we demonstrate how two straightforward simplifications to the state-of-the-art RL algorithm PI2 suffice to convert it into the black-box optimization algorithm (μW, λ)-ES. Furthermore, we show that (μW , λ)-ES empirically outperforms PI2 on the tasks in [36]. It is striking that PI2 and (μW , λ)-ES share a common core, and that the simpler algorithm converges faster and leads to similar or lower final costs. We argue that this difference is due to a third trend in robot skill learning: the predominant use of dynamic movement primitives (DMPs). We show how DMPs dramatically simplify the learning problem, and discuss the implications of this for past and future work on policy improvement for robot skill learning
Fichier non déposé

Dates et versions

hal-00922132 , version 1 (23-12-2013)

Identifiants

  • HAL Id : hal-00922132 , version 1

Citer

Freek Stulp, Olivier Sigaud. Robot Skill Learning: From Reinforcement Learning to Evolution Strategies. Paladyn: Journal of Behavioral Robotics, 2013, 4 (1), pp.49-61. ⟨hal-00922132⟩
358 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More