Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Liang Gong; Te Sun; Xudong Li; Ke Lin; Natalia Díaz-Rodríguez; David Filliat; Zhengfeng Zhang; Junping Zhang

doi:10.1016/j.ifacol.2021.04.227

Article Dans Une Revue IFAC-PapersOnLine Année : 2020

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

(1) , (1) , (1) , (1) , (2, 1) , (3, 2) , (4) , (4)

1
2
3
4

Liang Gong

Fonction : Auteur
PersonId : 1117140

Shanghai Jiao Tong University [Shanghai]

Te Sun

Fonction : Auteur

Shanghai Jiao Tong University [Shanghai]

Xudong Li

Fonction : Auteur

Shanghai Jiao Tong University [Shanghai]

Ke Lin

Fonction : Auteur

Shanghai Jiao Tong University [Shanghai]

Natalia Díaz-Rodríguez

Fonction : Auteur
PersonId : 170998
IdHAL : natalia-diaz-rodriguez
ORCID : 0000-0003-3362-9326
IdRef : 261850032

Flowing Epigenetic Robots and Systems

Shanghai Jiao Tong University [Shanghai]

David Filliat

Fonction : Auteur
PersonId : 45
IdHAL : david-filliat
ORCID : 0000-0002-5739-1618
IdRef : 070072337

Unité d'Informatique et d'Ingénierie des Systèmes

Flowing Epigenetic Robots and Systems

Zhengfeng Zhang

Fonction : Auteur

Fudan University [Shanghai]

Junping Zhang

Fonction : Auteur

Fudan University [Shanghai]

Résumé

Using direct reinforcement learning (RL) to accomplish a task can be very ine cient, especially in robotic configurations where interactions with the environment are lengthy and costly. Instead, learning from expert demonstration (LfD) is an alternative approach to gain better performance in an RL setting, which also greatly improves sample e ciency. We propose a novel demonstration learning framework for actor-critic based algorithms. Firstly, we put forward an environment pre-training paradigm to initialize the model parameters without interacting with the target environment, which e↵ectively avoids the cold start problem in deep RL scenarios.Secondly, we design a general-purpose LfD framework for most of the mainstream actor-critic RL algorithms that include a policy network and a value function like PPO, SAC, TRPO, A3C. Thirdly,we build a dedicated model training platform to perform the humanrobot interaction and numerical experimentation. We evaluate the method in six Mujoco simulated locomotion environments and our robot control simulation platform. Results show that several epochs of pre-training can improve the agent's performance over the early stage of training. Also, the final converged performance of the RL algorithm is also boosted by external demonstration. In general the sample e ciency is improved by 30% with the proposed method. Our demonstration pipeline makes full use of the exploration property of the RL algorithm, which is feasible for fast teaching robots in dynamic environments.

Mots clés

Deep learning Deep reinforcement learning Learning from demonstration (LfD) Actor-critic framework Robotics

Domaines

Robotique [cs.RO]

Fichier principal

CPHS2020.pdf (3.03 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

David Filliat : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03434380

Soumis le : jeudi 18 novembre 2021-11:38:24

Dernière modification le : jeudi 16 mars 2023-04:10:40

Archivage à long terme le : samedi 19 février 2022-18:49:15

Dates et versions

hal-03434380 , version 1 (18-11-2021)

Identifiants

HAL Id : hal-03434380 , version 1
DOI : 10.1016/j.ifacol.2021.04.227

Citer

Liang Gong, Te Sun, Xudong Li, Ke Lin, Natalia Díaz-Rodríguez, et al.. Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments. IFAC-PapersOnLine, 2020, 53 (5), pp.271-278. ⟨10.1016/j.ifacol.2021.04.227⟩. ⟨hal-03434380⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENSTA INRIA ENSTA_U2IS INRIA2 IP_PARIS

29 Consultations

48 Téléchargements

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager