A Q-Learning Algorithm with Continuous State Space

Kengy Barty; Pierre Girardeau; Jean-Sébastien Roy; Cyrille Strugarek

Article Dans Une Revue Optimization Online Année : 2006

A Q-Learning Algorithm with Continuous State Space

(1) , (1) , , (1)

Kengy Barty

Fonction : Auteur

Optimisation et commande

Pierre Girardeau

Fonction : Auteur
PersonId : 764150
IdRef : 151159351

Optimisation et commande

Jean-Sébastien Roy

Fonction : Auteur

Cyrille Strugarek

Fonction : Auteur

Optimisation et commande

Résumé

We study in this paper a Markov Decision Problem (MDP) with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced to solve this problem by Watkins in 1989 for completely discrete MDPs. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We give a convergence proof for this algorithm under usual assumptions. Finally, we illustrate our algorithm by solving the classical moutain car task with continuous state space.

Aurélien Arnoux : Connectez-vous pour contacter le contributeur

https://ensta-paris.hal.science/hal-00977539

Soumis le : vendredi 11 avril 2014-11:42:50

Dernière modification le : mercredi 11 mai 2022-12:06:05

Dates et versions

hal-00977539 , version 1 (11-04-2014)

Identifiants

HAL Id : hal-00977539 , version 1

Citer

Kengy Barty, Pierre Girardeau, Jean-Sébastien Roy, Cyrille Strugarek. A Q-Learning Algorithm with Continuous State Space. Optimization Online, 2006. ⟨hal-00977539⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENSTA UMA_ENSTA

65 Consultations

0 Téléchargements

A Q-Learning Algorithm with Continuous State Space

Résumé

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager