Deterministic Policy Gradient Algorithms - ENSTA Paris - École nationale supérieure de techniques avancées Paris Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Deterministic Policy Gradient Algorithms

Résumé

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic pol- icy gradient has a particularly appealing form: it is the expected gradient of the action-value func- tion. This simple form means that the deter- ministic policy gradient can be estimated much more efficiently than the usual stochastic pol- icy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counter- parts in high-dimensional action spaces.
Fichier principal
Vignette du fichier
dpg-icml2014.pdf (335.61 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00938992 , version 1 (29-01-2014)

Identifiants

  • HAL Id : hal-00938992 , version 1

Citer

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, et al.. Deterministic Policy Gradient Algorithms. ICML, Jun 2014, Beijing, China. ⟨hal-00938992⟩
6500 Consultations
11690 Téléchargements

Partager

Gmail Facebook X LinkedIn More