Stochastic models for document restructuration - Université Pierre et Marie Curie Accéder directement au contenu
Communication Dans Un Congrès Année : 2005

Stochastic models for document restructuration

Résumé

Document (re)structuration consists in mapping documents coming from different sources, with different formats, onto a predefined semi-structured format. This generic problem appears in different applications settings like heterogeneous semi-structured databases querying, peer to peer systems, legacy document conversion, XML information retrieval. In the paper, we define the restructuration problem from a document centric perspective and identify the main problems raised by this new problematic. We then consider two restructuration instances: structuring flat documents and learning the correspondence between structured formats. We propose stochastic models for these two tasks and describe tests on a large XML document collection.
Fichier non déposé

Dates et versions

hal-01357589 , version 1 (30-08-2016)

Identifiants

  • HAL Id : hal-01357589 , version 1

Citer

Patrick Gallinari, Guillaume Wisniewski, Francis Maes, Ludovic Denoyer. Stochastic models for document restructuration. ECML'05 Workshop on Relationnal Machine Learning, Oct 2005, Porto, Portugal. ⟨hal-01357589⟩
95 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More