A Q-Learning Algorithm with Continuous State Space - ENSTA Paris - École nationale supérieure de techniques avancées Paris Accéder directement au contenu
Article Dans Une Revue Optimization Online Année : 2006

A Q-Learning Algorithm with Continuous State Space

Pierre Girardeau
  • Fonction : Auteur
  • PersonId : 764150
  • IdRef : 151159351
Jean-Sébastien Roy
  • Fonction : Auteur

Résumé

We study in this paper a Markov Decision Problem (MDP) with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced to solve this problem by Watkins in 1989 for completely discrete MDPs. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We give a convergence proof for this algorithm under usual assumptions. Finally, we illustrate our algorithm by solving the classical moutain car task with continuous state space.
Fichier non déposé

Dates et versions

hal-00977539 , version 1 (11-04-2014)

Identifiants

  • HAL Id : hal-00977539 , version 1

Citer

Kengy Barty, Pierre Girardeau, Jean-Sébastien Roy, Cyrille Strugarek. A Q-Learning Algorithm with Continuous State Space. Optimization Online, 2006. ⟨hal-00977539⟩

Collections

ENSTA UMA_ENSTA
65 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More