A Generative Framework for Multimodal Learning of Spatial Concepts and Object Categories: An Unsupervised Part-of-Speech Tagging and 3D Visual Perception Based Approach - ENSTA Paris - École nationale supérieure de techniques avancées Paris Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

A Generative Framework for Multimodal Learning of Spatial Concepts and Object Categories: An Unsupervised Part-of-Speech Tagging and 3D Visual Perception Based Approach

Résumé

Future human-robot collaboration employs language in instructing a robot about specific tasks to perform in its surroundings. This requires the robot to be able to associate spatial knowledge with language to understand the details of an assigned task so as to behave appropriately in the context of interaction. In this paper, we propose a probabilistic framework for learning the meaning of language spatial concepts (spatial prepositions) and object categories based on visual cues representing spatial layouts and geometric characteristics of objects in a tabletop scene. The model investigates unsupervised Part-of-Speech (POS) tagging through a Hidden Markov Model (HMM) that infers the corresponding hidden tags to words. Spatial configurations and geometric characteristics of objects on the tabletop are described through 3D point cloud information that encodes spatial semantics and categories of referents and landmarks in the environment. The proposed model is evaluated through human user interaction with Toyota HSR robot, where the obtained results show the significant effect of the model in making the robot able to successfully engage in interaction with the user in space.
Fichier principal
Vignette du fichier
ICDL-Epirob-2017.pdf (646.29 Ko) Télécharger le fichier
Loading...

Dates et versions

hal-01953470 , version 1 (02-01-2019)

Identifiants

  • HAL Id : hal-01953470 , version 1

Citer

Amir Aly, Akira Taniguchi, Tadahiro Taniguchi. A Generative Framework for Multimodal Learning of Spatial Concepts and Object Categories: An Unsupervised Part-of-Speech Tagging and 3D Visual Perception Based Approach. IEEE International Joint Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Sep 2017, Lisbon, Portugal. ⟨hal-01953470⟩
36 Consultations
127 Téléchargements

Partager

Gmail Facebook X LinkedIn More