Yet Another Hybrid Segmentation Tool - Université Pierre et Marie Curie Accéder directement au contenu
Poster De Conférence Année : 2012

Yet Another Hybrid Segmentation Tool

Andrés Sanoja
  • Fonction : Auteur
  • PersonId : 934855
Stéphane Gançarski

Résumé

In this paper1 we present an overview of a prototype we are developing for in the context of web archives (page comparison, crawling and information retrieval). It analyses pages based on their DOM tree information and their visual rendering. This tool implements a modified version of VIPS with the aim of enhancing the precision of visual block extraction and the hierarchy construction. First, the visual rendering of a page, produced by several browsers, is segmented into rectangular blocks. Then, the extracted blocks are analysed looking for visual overlaps, which are analysed using a adapted version of the XY-Cut algorithm and resolve the overlap. As a result we may have different shapes of blocks, rectangular and non-rectangular blocks. Finally, the visual block tree, representing the layout of the page is analysed in order to have a more coherent layout disposition.

Domaines

Informatique
Fichier principal
Vignette du fichier
ipres2012-sanoja-gancarski.pdf (496.41 Ko) Télécharger le fichier
ipres_v3.pdf (2.4 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Format : Autre
Loading...

Dates et versions

hal-00770527 , version 1 (07-01-2013)

Identifiants

  • HAL Id : hal-00770527 , version 1

Citer

Andrés Sanoja, Stéphane Gançarski. Yet Another Hybrid Segmentation Tool. iPRES 2012 – 9 th International Conference on Preservation of Digital Objects, Oct 2012, Toronto, Canada. , 2012. ⟨hal-00770527⟩
408 Consultations
112 Téléchargements

Partager

Gmail Facebook X LinkedIn More