Adaptive Exploration for Continual Reinforcement Learning

Freek Stulp

Communication Dans Un Congrès Année : 2012

Adaptive Exploration for Continual Reinforcement Learning

(1, 2)

1
2

Freek Stulp

Fonction : Auteur
PersonId : 1420
IdHAL : freek-stulp
IdRef : 177920629

Robotique et Vision

Flowing Epigenetic Robots and Systems

Résumé

Most experiments on policy search for robotics focus on isolated tasks, where the experiment is split into two distinct phases: 1)~the learning phase, where the robot learns the task through exploration; 2)~the exploitation phase, where exploration is turned off, and the robot demonstrates its performance on the task it has learned. In this paper, we present an algorithm that enables robots to continually and autonomously alternate between these phases. We do so by combining the 'Policy Improvement with Path Integrals' direct reinforcement learning algorithm with the covariance matrix adaptation rule from the 'Cross-Entropy Method' optimization algorithm. This integration is possible because both algorithms iteratively update parameters with probability-weighted averaging. A practical advantage of the novel algorithm, called PI2-CMA, is that it alleviates the user from having to manually tune the degree of exploration. We evaluate PI2-CMA's ability to continually and autonomously tune exploration on two tasks.

Domaines

Robotique [cs.RO]

Freek Stulp : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00789389

Soumis le : lundi 18 février 2013-10:54:41

Dernière modification le : mercredi 15 mars 2023-08:50:07

Dates et versions

hal-00789389 , version 1 (18-02-2013)

Identifiants

HAL Id : hal-00789389 , version 1

Citer

Freek Stulp. Adaptive Exploration for Continual Reinforcement Learning. International Conference on Intelligent Robots and Systems (IROS), 2012, Portugal. pp.0-0. ⟨hal-00789389⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENSTA INRIA ENSTA_U2IS INRIA2

69 Consultations

0 Téléchargements

Adaptive Exploration for Continual Reinforcement Learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager