Co-scheduling for large-scale applications : memory and resilience

Loïc Pottier 1, 2
Abstract : This thesis explores co-scheduling problems in the context of large-scale applications with two main focus: the memory side, in particular the cache memory and the resilience side.With the recent advent of many-core architectures such as chip multiprocessors (CMP), the number of processing units is increasing.In this context, the benefits of co-scheduling techniques have been demonstrated. Recall that, the main idea behind co-scheduling is to execute applications concurrently rather than in sequence in order to improve the global throughput of the platform.But sharing resources often generates interferences.With the arising number of processing units accessing to the same last-level cache, those interferences among co-scheduled applications becomes critical.In addition, with that increasing number of processors the probability of a failure increases too.Resiliency aspects must be taking into account, specially for co-scheduling because failure-prone resources might be shared between applications.On the memory side, we focus on the interferences in the last-level cache, one solution used to reduce these interferences is the cache partitioning.Extensive simulations demonstrate the usefulness of co-scheduling when our efficient cache partitioning strategies are deployed.We also investigate the same problem on a real cache partitioned chip multiprocessors, using the Cache Allocation Technology recently provided by Intel.In a second time, still on the memory side, we study how to model and schedule task graphs on the new many-core architectures, such as Knights Landing architecture.These architectures offer a new level in the memory hierarchy through a new on-packagehigh-bandwidth memory. Current approaches usually do not take intoaccount this new memory level, however new scheduling algorithms anddata partitioning schemes are needed to take advantage of this deepmemory hierarchy.On the resilience, we explore the impact on failures on co-scheduling performance.The co-scheduling approach has been demonstrated in a fault-free context, but large-scale computer systems are confronted by frequent failures, and resilience techniques must be employed for large applications to execute efficiently. Indeed, failures may create severe imbalance between applications, and significantly degrade performance.We aim at minimizing the expected completion time of a set of co-scheduled applications in a failure-prone context by redistributing processors.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01892395
Contributor : Abes Star <>
Submitted on : Wednesday, October 10, 2018 - 3:43:24 PM
Last modification on : Saturday, December 15, 2018 - 3:30:36 AM
Long-term archiving on : Friday, January 11, 2019 - 5:01:32 PM

File

POTTIER_Loic_2018LYSEN039_Thes...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01892395, version 1

Citation

Loïc Pottier. Co-scheduling for large-scale applications : memory and resilience. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lyon, 2018. English. ⟨NNT : 2018LYSEN039⟩. ⟨tel-01892395⟩

Share

Metrics

Record views

195

Files downloads

108