An experimental analysis of regression-obtained HPC scheduling heuristics - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2023

An experimental analysis of regression-obtained HPC scheduling heuristics

Résumé

Scheduling jobs in High-Performance Computing (HPC) platforms typically involves heuristics consisting of job sorting functions such as First-Come-First-Served or custom (hand-engineered). Linear regression methods are promising for exploiting scheduling data to create simple and transparent heuristics with lesser computational overhead than state-of-the-art learning methods. The drawback is lesser scheduling performance. We experimentally investigated the hypothesis that we could increase the scheduling performance of regression-obtained heuristics by increasing the complexity of the sorting functions and exploiting derivative job features. We used multiple linear regression to develop a factory of scheduling heuristics based on scheduling data. This factory uses general polynomials of the jobs' characteristics as templates for the scheduling heuristics.  We defined a set of polynomials with increasing complexity between them, and we used our factory to create scheduling heuristics based on these polynomials.  We evaluated the performance of the obtained heuristics with wide-range simulation experiments using real-world traces from 1997 to 2016.  Our results show that large-sized polynomials led to unstable scheduling heuristics due to multicollinearity effects in the regression, with small-sized polynomials leading to a stable and efficient scheduling performance.  These results conclude that (i) multicollinearity imposes a constraint when one wants to derive new features (i.e., feature engineering) for creating scheduling heuristics with regression, and (ii) regression-obtained scheduling heuristics can be resilient to the long-term evolution of HPC platforms and workloads.
Fichier principal
Vignette du fichier
working_paper_hal.pdf (1.42 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
licence : CC BY - Paternité

Dates et versions

hal-03979237 , version 1 (08-02-2023)
hal-03979237 , version 2 (27-06-2023)

Licence

Paternité

Identifiants

  • HAL Id : hal-03979237 , version 1

Citer

Lucas de Sousa Rosa, Danilo Carastan-Santos, Alfredo Goldman. An experimental analysis of regression-obtained HPC scheduling heuristics. 2023. ⟨hal-03979237v1⟩
89 Consultations
58 Téléchargements

Partager

Gmail Facebook X LinkedIn More