Causal Reinforcement Learning using Observational and Interventional Data - ENSTA Paris - École nationale supérieure de techniques avancées Paris Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

Causal Reinforcement Learning using Observational and Interventional Data

Maxime Gasse
  • Fonction : Auteur
  • PersonId : 1119290
Guillaume Gaudron
  • Fonction : Auteur
  • PersonId : 1119292
Pierre-Yves Oudeyer

Résumé

Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non-trivial, is that we allow the observed agent to interact with the environment based on hidden information, which is not observed by the learning agent. We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model ? And can we expect the offline experiences to improve the agent's performances ? To answer these questions, we import ideas from the well-established causal framework of do-calculus, and we express model-based reinforcement learning as a causal inference problem. Then, we propose a general yet simple methodology for leveraging offline data during learning. In a nutshell, the method relies on learning a latent-based causal transition model that explains both the interventional and observational regimes, and then using the recovered latent variable to infer the standard POMDP transition model via deconfounding. We prove our method is correct and efficient in the sense that it attains better generalization guarantees due to the offline data (in the asymptotic case), and we illustrate its effectiveness empirically on synthetic toy problems. Our contribution aims at bridging the gap between the fields of reinforcement learning and causality.
Fichier principal
Vignette du fichier
2106.14421.pdf (936.44 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03465488 , version 1 (03-12-2021)

Identifiants

  • HAL Id : hal-03465488 , version 1

Citer

Maxime Gasse, Damien Grasset, Guillaume Gaudron, Pierre-Yves Oudeyer. Causal Reinforcement Learning using Observational and Interventional Data. 2021. ⟨hal-03465488⟩
51 Consultations
98 Téléchargements

Partager

Gmail Facebook X LinkedIn More