Publications


Below is a list of publications that have been produced throughout the duration of the CloudLightning Project.

 

ImageCL: Language and source-to-source compiler for performance portability, load balancing, and scalability prediction on heterogeneous systems

Authors

T. Falch (NTNU), A. C. Elster (NTNU)

Abstract

Applications written for heterogeneous CPU-GPU systems often suffer from poor performance portability. Finding good work partitions can also be challenging as different devices are suited for different applications.

This article describes ImageCL, a high-level domain-specific language and source-to-source compiler, targeting single system as well as distributed heterogeneous hardware. Initially targeting image processing algorithms, our framework now also handles general stencil-based operations. It resembles OpenCL, but abstracts away performance optimization details which instead are handled by our source-to-source compiler. Machine learning-based auto-tuning is used to determine which optimizations to apply. For the distributed case, by measuring performance counters on a small input on one device, previously trained performance models are used to predict the throughput of the application on multiple different devices, making it possible to balance the load evenly. Models for the communication overhead are created in a similar fashion and used to predict the optimal number of nodes to use.

Open Access: Yes

Read

Online Resource Coalition Reorganization for Efficient Scheduling on the Intercloud

Authors

A. Spataru (IeAT), T. Selea (IeAT), M Frincu (IeAT)

Abstract

While users running applications on the intercloud can run their applications on configurations unavailable on single clouds they are faced with VM performance fluctuations among providers and even within the same provider as recent papers have indicated. These fluctuations can impact an application’s objectives. A solution is to cluster intercloud resources into coalitions working together towards a common goal, i.e., ensuring that the deviation from the objectives is minimal. These coalitions are formed based on historical information on the performance of the underlying resources by assuming that patterns in the deployment of the applications are repeatable. However, static coalitions can lead to underutilized resources due to the fluctuating job flow leading to obsolete information. In this paper propose an online coalition formation metaheuristics which allows us to update existing and create new coalitions at run time based on the job flow. We test our AntClust online coalition formation method against a static coalition formation approach.

Open Access: No

Read

A Generic Framework Supporting Self-Organisation and Self-Management in Hierarchical Systems

Authors

H. Xiong (UCC), A. Spătaru (IeAT), G. G. Castañé (UCC), D. Dong (UCC), G. A. Gravvanis (DUTH), J. P. Morrison (UCC)

Abstract

A novel, generic, framework for supporting self-organisation and self-management in hierarchical systems is presented. The framework allows for the incorporation of local self-organising and self-managing strategies at each level in the hierarchy. These local strategies determine the behaviour of that level and the effects of these strategies can be communicated to, and used by, the strategies in adjacent levels of the hierarchy. Thus, in general, strategies may be viewed as parameterised functions. Information emanating from both the lower and the upper levels in the hierarchy can be used as parameters. Information from below represents the status of the lower levels, whereas information from above can be used to influence the direction and the rate of system evolution. As the component parts of the system evolve to their goal states, the rate of evolution slows. At that point, by definition, a component is maximally contributing to the global goal state of the system as a whole. A novel concept to measure the distance that a component is from this stasis, its Suitability Index is presented and formally defined. Although the proposed framework can be generalised to any hierarchical system, this paper applies it specifically to large scale, hierarchically structured, computer systems. An implementation of this framework and an empirical study of its effectiveness has been conducted as part of the the CloudLightning Project.

Open Access: Yes

Read

The CloudLightning approach to cloud-user interaction

Authors

T. Selea (IeAT), I. Drăgan (IeAT), T.-F. Fortiş (IeAT)

Abstract

Deploying applications which require computational power or data intensive capabilities into a cloud environment might became overwhelming, due to the diversity of cloud services and resources. CloudLightning allows users to design their application topology and deploy it without the need for selecting the most suitable resources. The task of finding the resources for an application is taken care of by the CloudLightning self-organizing self-management component. In this paper we describe the functionality of the Gateway Service, the component that enables users to access the CloudLightning functionality.

Open Access: No

Read

Understanding the Determinants of Cloud Computing Adoption for High Performance Computing

Authors

T. Lynn (DCU), X. Liang (TCD), A. Gourinovitch (DCU), J. Morrison (UCC), G. Fox (DCU), P Rosati (DCU)

Abstract

Within the complex context of high performance computing (HPC), the factors influencing technology adoption decisions remain largely unexplored. This study extends Diffusion of Innovation (DOI) and Human-Organization-Technology fit (HOT-fit) theories into an integrated model, to explore the impact of ten factors on cloud computing adoption decisions in the HPC context. The results suggest that adopters and non-adopters have different perceptions of the indirect benefits, adequacy of resources, top management support, and compatibility of adopting cloud computing for HPC. In addition, perceptions of the indirect benefits and HPC competences can be used to predict the cloud computing adoption decision for HPC. This is one of the first studies in the information systems (IS) literature exploring the factors impacting the cloud computing adoption decision in the important context of HPC. It integrates two influential technology adoption theories and enhances understanding of the key factors influencing organizations’ cloud computing adoption decisions in this context.

Open Access: Yes

Read

A Comparative Study of CPU Power Consumption Models for Cloud Simulation Frameworks

Authors

A. Makaratzis (CERTH), C. K. Filelis-Papadopoulos (DUTH), K.Giannoutakis (CERTH), G A. Gravvanis (DUTH) and Dimitrios Tzovaras (CERTH)

Abstract

In this paper a comparative study of CPU power models that have been widely used in cloud simulation environments is conducted. Generic CPU power models for the estimation of the energy consumption of CPU servers have been proposed in cloud simulation frameworks, since estimations of the energy consumption of cloud computing infrastructures can be obtained through simulation experimentation. The main characteristic of these models is that they have low computational complexity and can be applied to a wide range of modern CPU servers. A recently proposed CPU power consumption model, based on a third degree polynomial, is examined and evaluated. A comparative study based on available CPU power measurements of CPU servers, obtained from SPEC benchmark, is conducted. Additionally, experimentation on three CPU servers is performed and power measurements are obtained for various workloads in order evaluate the estimations of the different CPU power models.

Open Access: Yes

Read

Benchmarking the WRF Model on Bluegene/P, Cluster, and Cloud Platforms and Accelerating Model Setup through Parallel Genetic Algorithms

Authors

L. Oana and M. Frincu (IeAT)

Abstract

This paper investigates the scalability of WRF (Weather Research and Forecast) model on three different platforms: BlueGene/P, Intel Xeon Cluster and Microsoft Azure cloud at different resolutions and domain sizes. Contrary to prior work we benchmark the model on a cloud platform, analyze the behavior of various individual configurations, and test the scalability of our previously proposed parallel genetic algorithm for physical parametrization of WRF. While we obtain good results on all platforms, the speedup is particularly interesting on the cloud platform, peaking at 10x using OpenMP library with up to 20 processors. This is similar to that achieved on Bluegene/P on 1,024 cores. On the Bluegene/P and Xeon cluster we used the MPI library which provided speedups ranging between 2–10x. Overall, running on Azure or the Xeon cluster is more effective than running experiments on Bluegene/P especially at low resolutions and with few processors. Finally, we tested the scalability of a parallel GA implementation with the purpose of finding optimal physical parametrization settings for WRF and achieved speedups up to 6x which proved superior than those from a similar experiment done in a prior paper.

Open Access: Yes

Read

Utility-Driven Deployment Decision Making

Authors

R. Loomba (Intel), T. Metsch (Intel), L. Feehan (Intel), J. Butler (Intel)

Abstract

This paper presents the formulation of two utility functions, for the resource provider and the service customer respectively, to enable the orchestrator to consider business goals/incentives when making comparisons between deployments. The use of cost elements in these utilities further supports a precise exposure of the value proposition of differentiated or heterogeneous infrastructure, which is greatly beneficial to resource providers.

Open Access: No

Read

Large-scale simulation of a self-organizing self-managing cloud computing framework

Authors

C. K. Filelis-Papadopoulos (DUTH), K. M. Giannoutakis (CERTH), G. A. Gravvanis (DUTH), Dimitrios Tzovaras (CERTH)

Abstract

A recently introduced cloud simulation framework is extended to support self-organizing and self-management local strategies in the cloud resource hierarchy. This dynamic hardware resource allocation system is evolving toward the goals defined by local strategies, which are determined as maximization of: energy efficiency of cloud infrastructures, task throughput, computational efficiency and resource management efficiency. Heterogeneous hardware resources are considered that are except from commodity CPU servers, hardware accelerators such as GPUs, MICs and FPGAs, thus forming a heterogeneous cloud infrastructure. Energy consumption and task execution models for the heterogeneous accelerators are also proposed, in order to demonstrate the energy efficiency of the proposed resource allocation system. Implementation details of the new functionalities on the parallel cloud simulation framework are discussed, while numerical results are given for the scalability and utilization of the cloud elements using the self-organization and self-management framework with two VM placement strategies.

Open Access: Yes

Read

Energy Modeling in Cloud Simulation Frameworks

Authors

A. Makaratzis (CERH), K. Giannoutakis (CERTH), D. Tzovaras (CERTH)

Abstract

There is a quite intensive research for Cloud simulators in the recent years, mainly due to the fact that the need for powerful computational resources has led organizations to use cloud resources instead of acquiring and maintaining private servers. In order to test and optimize the strategies that are being used on cloud resources, cloud simulators have been developed since the simulation cost is substantially smaller than experimenting on real cloud environments. Several cloud simulation frameworks have been proposed during the last years, focusing on various components of the cloud resources. In this paper, a survey on cloud simulators is conducted, in order to examine the different models that have been used for the hardware components that constitute a cloud data center. Focus is given on the energy models that have been proposed for the prediction of the energy consumption of data center components, such as CPU, memory, storage and network, while experiments are performed in order to compare the different power models used by the simulation frameworks. The following cloud simulation frameworks are considered: CloudSched, CloudSim, DCSim, GDCSim, GreenCloud and iCanCloud.

Open Access: Yes

Read

A Framework for Simulating Large Scale Cloud Infrastructures

Authors

C. Filelis-Papadopoulos (DUTH), G.A. Gravvanis (DUTH), P.E. Kyziropoulos

Abstract

Cloud infrastructures are continuously growing in size, since more cloud nodes are added to already existing hyper-scale infrastructures. These hyper-scale infrastructures are also becoming heterogeneous as different types of accelerators are added in order to increase performance per watt for certain types of applications and allow for various HPC workloads to migrate to Cloud environments. The introduction of diverse workloads that migrate in the Cloud along with increasing volume of incoming tasks results in phenomena of network congestion, underutilization and resource fragmentation. Simulators are used to analyze, study and possibly improve Cloud environments. However, existing Cloud simulation tools lack the ability to handle heterogeneous resources and tasks that span across multiple Cloud nodes. Moreover, they are mostly sequential and cannot scale to large numbers of Cloud nodes. Furthermore, they do not support over-commitment, which is a common practice in real-world Cloud environments. A framework for simulating large numbers of heterogeneous cloud nodes organized in Cells and executing large numbers of HPC tasks is proposed. The framework is inherently parallel and designed for hybrid distributed memory parallel systems, supporting CPU, memory and network over-commitment. The simulation framework is based on a time advancing loop, allowing dynamic change of the granularity of the simulator and minimizing memory requirements, since data related to the current time-step is stored. Moreover, a latency model for the currency of data in the Gateway Service and Broker is also supported. Implementation details along with discussions concerning the extensibility of the framework are given. Numerical results for simulating large number of heterogeneous resources and incoming tasks are also presented.

Open Access: Yes

Read

CloudLightning: a Self-Organized Self-Managed Heterogeneous Cloud

Authors

H. Xiong (UCC), D. Dong (UCC), C. Filelis-Papadopoulos (DUTH), G. G. Castane (UCC), T. Lynn (DCU), D. C. Marinescu (UCF), J. P. Morrison (UCC)

Abstract

The increasing heterogeneity of cloud resources, and the increasing diversity of services being deployed in cloud environments are leading to significant increases in the complexities of cloud resource management. This paper presents an architecture to manage heterogeneous resources and to improve service delivery in cloud environments. A loosely-coupled, hierarchical, self-adapting management model, deployed across multiple layers, is used for heterogeneous resource management. Moreover, a service-specific coalition formation mechanism is employed to identify appropriate resources to support the process parallelism associated with high performance services. Finally, a proof-of-concept of the proposed hierarchical cloud architecture, as realized in CloudLightning project, is presented.

Open Access: Yes

Read

Towards a Scalable and Adaptable Resource Allocation Framework in Cloud Environments

Authors

H. Xiong (UCC), C. Filelis-Papadopoulos (CERTH), D. Dong (UCC), G.G. Castañé (UCC), J.P. Morrison (UCC)

Abstract

Finding an appropriate resource to host the next application to be deployed in a Cloud environment can be a non-trivial task. To deliver the appropriate level of service, the functional requirements of the application must be met. Ideally, this process involves filtering the best resource from a number of possible candidates, whilst simultaneously satisfying multiple objectives. If timely responses to resource requests are to be maintained, the sophistication of the filtering mechanism and the size of the search space have to be carefully balanced. The quality of the solution will thus not readily scale with growth in cloud resources and filtering complexity. This limitation is becoming more evident with the emergence of hyper-scale clouds and with the increased complexity needed to accommodate the growing heterogeneity in resources. Moreover, meeting non-functional requirements, reflecting the Cloud Service Provider’s business objects, is also becoming increasingly critical as service utilization and energy efficiency in a typical cloud deployment are extremely low. This paper proposes a reexamination of the resource allocation problem by proposing a framework to support distributed resource allocation decisions and that can be dynamically populated with strategies to reflect the ever-growing number of diverse objectives as they become evident in the evolving cloud infrastructure.

Open Access: No

Read

Towards the Integration of a HPC Build System in the Cloud Ecosystem

Authors

I. Drăgan (IeAT), T. Selea (IeAT), T-F. Fortiş (IeAT)

Abstract

Once the Cloud computing matures, and the diversification of resources and levels at which they can be accessed, there is a growing need to identify specialized languages and technologies that can provide a high level of flexibility and transparency in accessing, managing, and utilizing these resources. In this context, the alignment of these capabilities with current developments, especially at topology, orchestration and management level, becomes a necessity. Such an implementation is usually based on a self-* approach, which is also suitable for supporting the migration of selected HPC applications to the cloud. Our research is based on a self-organizing, self-management approach and investigates the option of self-configuration, supported by the easybuild toolchain.

Open Access: No

Read

Communicating Complex Services Using Electronic Word-of-Mouth on Twitter: An Exploratory Study

Authors

A. Gourinovitch (DCU), X. Liang (DCU), P. Rosati (DCU), T. Lynn (DCU)

Abstract

Social networking sites have radically changed the way individuals and organizations interact and share information. Word-of-mouth (WOM) is one the most investigated phenomenon in marketing research, and affects customers’ trust and purchase decisions. This is particularly true for complex products and services. Nowadays, word-of-mouth mostly occurs through digital platforms. While there is general agreement that electronic word of mouth (eWOM) can have a potentially larger impact than traditional WOM, empirical research on the determinants of eWOM impact is still at an early stage. This exploratory study presents analysis of the antecedents of eWOM impact on Twitter for 4,569 original posts generated by 1,771 accounts relating to a complex service i.e. high performance computing in the cloud. The empirical analysis suggests that message sentiment and user activity are positively related with eWOM, while the effect of user visibility and user credibility tend to be negative. This study represents a preliminary study of the antecedents of eWOM for complex services on Twitter and prepares ground for future research.

Open Access: Gold

Read

Scheduling Data Stream Jobs on Distributed Systems with Background Load

Authors

A. Vulpe (IeAT), M. Frincu (IeAT)

Abstract

Cloud computing is used by numerous applications tailored for on-demand execution on elastic resources. While most cloud based applications rely mostly on virtualization, an emerging technology based on lightweight containers is starting to gain traction. While most research on job scheduling on clouds has focused on dedicated machines, the emergence and applicability of containers on a wider range of platforms including IoT, reopens the issue of scheduling on non-dedicated machines with high priority background load.

In this paper we address this problem by proposing a model and several heuristics for scheduling non-preemptive data stream jobs on containers running on machines with background load. We also address the issue of estimating the container parameters. The heuristics are tested and analyzed based on real-life traces.

Open Access: Gold

Read

Architecting a Hybrid Cross Layer Dew-Fog-Cloud Stack for Future Data-Driven Cyber-Physical Systems

Authors

M. Frincu (IeAT)

Abstract

The Internet of Things is gaining traction due to the emergence of smart devices surrounding our daily lives. These cyber-physical systems (CPS) are highly distributed, communicate over wi-fi or wireless and generate massive amounts of data. In addition, many of these systems require near real-time control (RTC). In this context, future IT platforms will have to adapt to the Big Data challenge by bringing intelligence to the edge of the network (dew computing) for low latency fast local decisions while keeping at the same time a centralized control based on well-established scalable and fault tolerant technologies brought to life by cloud computing. In this paper we address this challenge by proposing a hybrid cross layer dew-fog-cloud architecture tailored for large scale data-driven CPSs. Our solution will help catalyze the next generation of computational platforms where mobile and dynamic IoT platforms with energy and computational constraints will be used on demand for storing and computing Big Data located nearby in near real-time for local decisions and extend to cloud systems for fast orchestrated centralized decisions. The proposed architecture aims to leverage the advantages of both cloud and dew systems to overcome the challenges and limitations of modern communication networks. We also discuss two real-life life solutions from the field of smart grids and smart transportation systems.

Open Access: Gold

Read

Separation of Concerns in Heterogeneous Cloud Environments

Authors

D. Dong (UCC), H. Xiong (UCC), J. Morrison (UCC)

Abstract

The majority of existing cloud service management frameworks implement tools, APIs, and strategies for managing the lifecycle of cloud applications and/or resources. They are provided as a self-service interface to cloud consumers. This self-service approach implicitly allows cloud consumers to have full control over the management of applications as well as the underlying resources such as virtual machines and containers. This subsequently narrows down the opportunities for Cloud Service Providers to improve resource utilization, power efficiency and potentially the quality of services. This work introduces a service management framework centred around the notion of Separation of Concerns. The proposed service framework addresses the potential conflicts between cloud service management and cloud resource management while maximizing user experience and cloud efficiency on each side. This is particularly useful as the current homogeneous cloud is evolving to include heterogeneous resources.

Open Access: Gold

Read

Managing and Unifying Heterogenous Resources in Cloud Environments

Authors

D. Dong (UCC), H. Xiong (UCC), P. Stack (UCC), J. P. Morrison (UCC)

Abstract

A mechanism for accessing heterogeneous resources through the integration of various cloud management platforms is presented. In this scheme, hardware resources are offered using virtualization, containerization and as bare metal. Traditional management frameworks for managing these offerings are employed and invoked using a novel resource coordinator. This coordinator also provides an interface for cloud consumers to deploy applications on the underlying heterogeneous resources. The realization of this scheme in the context of the CloudLightning project is presented and a demonstrative use case is given to illustrate the applicability of the proposed solution.

Open Access: Gold

Read

Applying Self-* Principles in Heterogeneous Cloud Environments

Authors

I. Drăgan , T-F. Fortiş, G. Iuhasz, M. Neagul, D. Petcu (IeAT)

Abstract

Nowadays we are witnessing multiple changes in the way data- and compute-intensive services are offered to the users due to the influences of cloud computing, automatic computing, or the ever increase of heterogeneity in terms of computing resources. One particular example of such influences is the case of self-* principles that are intended to offer the basis of interesting alternatives to the traditional ways of computing. Our chapter is aiming at giving a brief overview of the basic concepts that are being used in practice and theory in order to advance the field of self-* clouds to new horizons.

Open Access: No

Read

A Review of Cloud Computing Simulation Platforms and Related Environments

Authors

J. Byrne, S. Svorobej, K. Giannoutakis (CERTH), D. Tzovaras (CERTH), PJ Byrne, P-O Östberg, A. Gourinovitch (DCU), T. Lynn (DCU)

Abstract

In recent years, there has been a significant increase in the development and extension of tools to support DES for cloud computing resulting in a wide range of tools which vary in terms of their utility and features. Through a review and analysis of available literature, this paper provides an overview and multi-level feature analysis of 33 DES tools for cloud computing environments. This review updates and extends existing reviews to include not only autonomous simulation platforms, but also on plugins and extensions for specific cloud computing use cases. This review identifies the emergence of CloudSim as a de facto base platform for simulation research and shows a lack of tool support for distributed execution (parallel execution on distributed memory systems).

Open Access: Gold

Read

A Preliminary Systematic Review of Computer Science Literature on Cloud Computing Research using Open Source Simulation Platforms

Authors

T. Lynn (DCU), A. Gourinovitch (DCU), J. Byrne, PJ Byrne, S. Svorobej, K. Giannoutakis (CERTH), D. Kenny (UCC) and J.P. Morrison (UCC)

Abstract

This paper provides a preliminary systematic review of literature on this topic covering 256 papers from 2009 to 2016. The paper aims to provide insights into the current status of cloud computing research using open source cloud simulation platforms. Our two-level analysis scheme includes a descriptive and synthetic analysis against a highly cited taxonomy of cloud computing. The analysis uncovers some imbalances in research and the need for a more granular and refined taxonomy against which to classify cloud computing research using simulators. The paper can be used to guide literature reviews in the area and identifies potential research opportunities for cloud computing and simulation researchers, complementing extant surveys on cloud simulation platforms.

Open Access: Gold

Read

Self-Healing in a Decentralised Cloud Management System

Authors

P. Stack (UCC), H. Xiong (UCC), D. Mersel, M. Makhloufi, G. Terpend, D. Dong (UCC)

Abstract

With the advent of heterogeneous resources and increasing scale, present cloud environments are becoming more and more complex. In order to manage heterogeneous cloud infrastructures at scale, in a reliable and robust manner, systems and services with autonomic behaviours are advantaging. In this paper, self-healing concepts are introduced for autonomic cloud management. A layered master-slave structure is proposed, providing the reliability and high availability for a decentralised, hierarchical cloud architecture.

Open Access: No

Read

Cloud Deployment and Management of Dataflow Engines

Authors

N. Trifunovic (Maxeler), H. Palikareva (Maxeler), T. Becker (Maxeler), G. Gaydadjiev (Maxeler)

Abstract

Maxeler Technologies successfully commercialises high-performance computing systems based on dataflow technology. Maxeler dataflow computers have been deployed in a wide range of application domains including financial data analytics, geoscience and low-latency transaction processing. In the context of cloud computing steadily growing acceptance in new domains, we illustrate how Maxeler dataflow systems can be integrated and employed in a self-organising self-managing heterogeneous cloud environment.

Open Access: No

Read

On the power consumption modeling for the simulation of Heterogeneous HPC clouds

Authors

K. Giannoutakis (CERTH), A. Makaratzis (CERTH), D. Tzovaras (CERTH), C. K. Filelis-Papadopoulos (DUTH), G. A. Gravvanis (DUTH)

Abstract

During the last years, except from the traditional CPU based hardware servers, hardware accelerators are widely used in various HPC application areas. More specifically, Graphics Processing Units (GPUs), Many Integrated Cores (MICs) and Field-Programmable Gate Arrays (FPGAs) have shown a great potential in HPC and have been widely mobilized in supercomputing. With the adoption of HPC from cloud environments, the realization of HPC-Clouds is evolving since many vendors provide HPC capabilities on their clouds. With the increase of the interest on clouds, there has been an analogous increase in cloud simulation frameworks.

Cloud simulation frameworks offer a controllable environment for experimentation with various workloads and scenarios, while they provide several metrics such as server utilization and power consumption. For providing these metrics, cloud simulators propose mathematical models that estimate the behavior of the underlying hardware infrastructure. This paper focuses on the power consumption modeling of the main compute elements of heterogeneous HPC servers, i.e. CPU servers and pairs of CPU-accelerators. The modeling approaches of existing cloud simulators are examined and extended, while new models are proposed for estimating the power consumption of accelerators.

Open Access: No

Read

CloudLightning Simulation and Evaluation Roadmap

Authors

C. K. Filelis-Papadopoulos (DUTH), G. A. Gravvanis (DUTH), J. P. Morrison (UCC)

Abstract

The CloudLightning (CL) system, designed in the frame of the CloudLightning project, is a service-oriented architecture for the emerging large scale heterogeneous cloud. It facilitates a clear distinction between service-lifecyle management and resource-lifecycle management. This separation of concerns is used to make resource management issues tractable at scale and to enable functionality that is currently not naturally covered by the cloud paradigm. In particular, the CL project seeks to maximize computational efficiency of the cloud in a number of specific ways; by exploiting prebuilt HPC environments, by dynamically building HPC instances, by improving server utilization, by reducing power consumption and by improving service delivery. Given the scale and complexity of this project, its utility can presently only be measured through simulation. This paper outlines the parameters, constraints and limitation being considered as part of the design and construction of that simulation environment.

Open Access: No

Read

Elastic Cloud Services Compliance with Gustafson’s and Amdahl’s Laws

Authors

S. Ristov, R. Prodan, D. Petcu (IeAT)

Abstract

The speedup that can be achieved with parallel and distributed architectures is limited at least by two laws: the Amdahl’s and Gustafson’s laws. The former limits the speedup to a constant value when a fixed size problem is executed on a multiprocessor, while the latter limits the speedup up to its linear value for the fixed time problems, which means that it is limited by the number of used processors. However, a superlinear speedup can be achieved (speedup greater than the number of used processors) due to insufficient memory, while, parallel and, especially distributed systems can even slowdown the execution due to the
communication overhead, when compared to the sequential one. Since the cloud performance is uncertain and it can be influenced by available memory and networks, in this paper we investigate if it follows the same speedup pattern as the other traditional distributed systems. The focus is to determine how the elastic cloud services behave in the different scaled environments. We define several scaled systems and we model the corresponding performance indicators. The analysis shows that both laws limit the speedup for a specific range of the input parameters and type of scaling. Even more, the speedup in cloud systems follows the Gustafson’s extreme cases, i.e. insufficient memory and communication bound domains.

Open Access: Gold

Read

Characterization of hardware in self-managing self-organizing Cloud environment

Authors

C.K. Filelis-Papadopoulos (DUTH), E.N.G. Grylonakis, P.E. Kyziropoulos, G.A. Gravvanis (DUTH) and J.P. Morrison (UCC)

Abstract

During the last decade, multiple applications and services have been migrated to Cloud computing infrastructures. Cloud infrastructures offer flexibility in terms of the variety of applications they can service. Moreover, integrity of data and virtually unlimited storage space are attractive features, especially for end-users requiring massive amounts of storage. Recently, many HPC applications have also been migrated to the Cloud. Such applications include Oil and Gas exploration, Genomics and Ray-tracing. However, the problem of underutilization of computational resources, as well as the choice of adequate computational equipment as a function of input data, computational work, pricing and energy consumption poses a major problem in modern Cloud environment. A technique for the characterization of hardware with respect to application and hardware parameters, e.g. computational efficiency versus power consumption is proposed. The technique is based on indexes built upon ratios to baseline hardware with respect to three of the applications involved in the CloudLightning project: Oil and Gas, Ray-Tracing, Dense and Sparse matrix Computations.

Open Access: No

Read

On Issues Concerning Cloud Environments in Scope of Scalable Multi-Projection Methods

Authors

B.E. Moutafis, C.K. Filelis-Papadopoulos (DUTH), G.A. Gravvanis (DUTH) and J.P. Morrison (UCC)

Abstract

Over the last decade, Cloud environments have gained significant attention by the scientific community, due to their flexibility in the allocation of resources and the various applications hosted in such environments. Recently, high performance computing applications are migrating to Cloud environments. Efficient methods are sought for solving very large sparse linear systems occurring in various scientific fields such as Computational Fluid Dynamics, N-Body simulations and Computational Finance. Herewith, the parallel multi-projection type methods are reviewed and discussions concerning the implementation issues for IaaS-type Cloud environments are given. Moreover, phenomena occurring due to the “noisy neighbor” problem, varying interconnection speeds as well as load imbalance are studied. Furthermore, the level of exposure of specialized hardware residing in modern CPUs through the different layers of software is also examined. Finally, numerical results concerning the applicability and effectiveness of multi-projection type methods in Cloud environments based on OpenStack are presented.

Open Access: No

Read

An approach for scaling cloud resource management

Authors

D.C. Marinescu, A. Paya, J. P. Morrison (UCC), S. Olariu

Abstract

Given its current development trajectory, the complexity of cloud computing ecosystems are evolving to where traditional resource management strategies will struggle to remain fit for purpose. These strategies have to cope with ever-increasing numbers of heterogeneous resources, a proliferation of new services, and a growing user-base with diverse and specialized requirements. This growth not only significantly increases the number of parameters needed to make good decisions, it increases the time needed to take these decisions. Consequently, traditional resource management systems are increasingly prone to poor decisions making. Devolving resources management decisions to the local environment of that resource can dramatically increase the speed of decisions making; moreover, the cost of gathering global information can thus be eliminated; saving communication costs. Experimental data, provided in this paper, illustrate that extant cloud deployments can be used as effective vehicles for devolved decision making. This finding strengthens the case for the proposed paradigm shift, since it does not require a change to the architecture of existing cloud systems. This shift would result in systems in which resources decide for themselves how best they can be used. This paper takes this idea to its logical conclusion and proposes a system for supporting self-managing resources in cloud environments. It introduces the concept of coalitions, consisting of collaborating resources, formed for the purpose of service delivery. It suggests the utility of restricting the interactions between the end-user and the cloud service provider to a well-defined services interface. It shows how clouds can be considered functionally, as engines for delivering an appropriate set of resources in response to service requests. And finally, since modern applications are increasingly constructed from sophisticated workflows of complex components, it shows how combinatorial auctions can be used to effectively deliver packages of resources to support those workflows.

Open Access: No

Read

Topics in cloud incident management

Authors

T. F. Fortis (IeAT), V. I. Munteanub (UVT)

Abstract

Continuous advancement of cloud technologies, alongside their ever increasing stability, adoption, and ease of use, has led to a rise in native cloud applications, possibly over a larger pool of heterogeneous resources or in multi-cloud approaches. This, in turn, brought unprecedented levels of complexity in the context of cloud computing. Such complexity may cause a series of events and incidents that are difficult to be intercepted or managed on time, in a manner that also ensures the overall Quality of Services and existing Service-Level Agreements. Our special issue presents advances in several key areas that are highly relevant for automated cloud incident management: a ‘continuous approach’ for reliable cloud native applications, novel approaches for Metal-as-a-Service, centered around an advanced reservation system, or development of a framework based on the concept of secure SLA, in order to deal with specific cloud security issues.

Open Access: Gold

Read

Characterizing numascale clusters with GPUs: MPI-based and GPU interconnect benchmarks

Authors

M. M. Khan (NTNU), A. C. Elster (NTNU)

Abstract

Modern HPC clusters are increasingly heterogeneous both in processor types, topologies of computing, communication and storage resources. In this paper, we describe how to use benchmarking, to characterize the high-speed interconnect, NumaConnect, associated with a shared-memory Numascale cluster system with GPUs, constituting a novel testbed at NTNU. Numascale systems include a unique node controller, NumaConnect, based on the FPGA or ASIC-based NumaChip, depending on system vendor requirements. The system’s interconnects uses AMD’s HyperTransport protocol, and provide a cache-coherent shared-memory single image operating system. Our system has, in addition, a GPU added to each server blade. Our characterizations efforts target the NumaConnect which includes an RDMA-type Block Transfer Engine (BTE). The BTE is used by Byte Transfer Libraries such as the NumaConnect BTL (NC-BTL) for message passing (MPI) or BLACS. To characterize our Numascale system, we use several benchmark suites including: our own SimpleBench that includes ping-pong, MPI-Reduce and MPI-Barrier tests; two well-known MPI benchmark suites: the NAS Parallel Benchmarks (NPB)-MPI, the OSU microbenchmarks; as well as Nvidia’s Bandwidth test for GPUs. Our results show that it is generally very beneficial to use MPI or other libraries that use the NC-BTL library. In fact, on selected OSU and NPB benchmarks, we achieve order-of-magnitude performance improvements on communication and synchronization costs on these benchmarks when using NC-BTL.

Open Access: No

Read

Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications

Authors

T. L. Falch (NTNU), A. C. Elster (NTNU)

Abstract

Heterogeneous computing, combining devices with different architectures such as CPUs and GPUs, is rising in popularity and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programming such systems and offers functional portability. However, it suffers from poor performance portability, because applications must be retuned for every new device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the tuning parameter spaces, and the results are used to build a machine learning-based performance model. The model can then be used to find interesting subspaces for further search. We evaluate our method using five image processing benchmarks, with tuning parameter space sizes up to 2.3 M, using different input sizes, on several devices, including an Intel i7 4771 (Haswell) CPU, an Nvidia Tesla K40 GPU, and an AMD Radeon HD 7970 GPU. We compare different machine learning algorithms for the performance model. Our model achieves a mean relative error as low as 3.8% and is able to find solutions on average only 0.29% slower than the best configuration in some cases, evaluating less than 1.1% of the search space. The source code of our framework is available at https://github.com/acelster/ML-autotuning.

Open Access: Yes

Read

Exposing HPC services in the Cloud: the CloudLightning Approach

Authors

I. Dragan (IeAT), T. F. Fortis (IeAT), M. Neagul (IeAT)

Abstract

Nowadays we are noticing important changes in the way High Performance Computing (HPC) providers are dealing with the demand. The growing requirements of modern data- and compute-intensive applications ask for new models for their development, deployment and execution. New approaches related with Big Data, peta- and exa-scale computing are going to dramatically change the design, development and exploitation of highly demanding applications, such as the HPC ones. Due to the increased complexity of these applications and their outstanding requirements which cannot be supported by the classical centralized cloud models, novel approaches, inspired by autonomic computing, are investigated as an alternative. In this paper, we offer an overview of such an approach, undertaken by the CloudLightning initiative. In this context, a novel cloud delivery model that offers the capabilities to describe and deliver dynamic and tailored services is being considered. This new delivery model, based on a self-organizing and self-managing approach, will allow provisioning and delivery of coalitions of heterogeneous cloud resources, built on top of the resources hosted by a cloud service provider.

Open Access: Gold

Read

A Cloud Reservation System for Big Data Applications

Authors

D. Marinescu, A. Paya, J. Morrison (UCC)

Abstract

Emerging Big Data applications increasingly require resources beyond those available from a single server and may be expressed as a complex workflow of many components and dependency relationships – each component potentially requiring its own specific, and perhaps specialized, resources for its execution. Efficiently supporting this type of Big Data application is a challenging resource management problem for existing cloud environments. In response, we propose a two-stage protocol for solving this resource management problem. We exploit spatial locality in the first stage by dynamically forming rack-level coalitions of servers to execute a workflow component. These coalitions only exist for the duration of the execution of their assigned component and are subsequently disbanded, allowing their resources to take part in future coalitions. The second stage creates a package of these coalitions, designed to support all the components in the complete workflow. To minimize the communication and housekeeping overhead needed to form this package of coalitions, the technique of combinatorial auctions is adapted from market-based resource allocation. This technique has a considerably lower overhead for resource aggregation than the traditional hierarchically organized models. We analyze two strategies for coalition formation: the first, history-based uses information from past auctions to pre-form coalitions in anticipation of predicted demand; the second one is a just-in-time that builds coalitions only when support for specific workflow components is requested.

Open Access: No

Read

Supporting Heterogeneous Pools in a Single Ceph Storage Cluster

Authors

S. Meyer (UCC), J. Morrison (UCC)

Abstract

In a general purpose cloud system efficiencies are yet to be had from supporting diverse application requirements within a heterogeneous storage system. Such a system poses significant technical challenges since storage systems are traditionally homogeneous. This paper uses the Ceph distributed file system, and in particular its concept of storage pools, to show how a storage solution can be partitioned to provide the heterogeneity needed to support the required application requirements.

Open Access: No

Read

Benchmarking the Numascale Shared Memory Cluster System with MPI

Authors

M. Khan (NTNU), A. Elster (NTNU)

Abstract

Modern clusters and HPC systems are becoming increasingly complex in their topology and in the heterogeneity of the computing devices used. Additionally, they are affected by energy usage and interconnect performances. This situation exposes the parallel applications to the difficult challenge of taking maximum advantage of the superior computing potential offered by such systems. MPI has been one of the most popular model for the parallel scientific applications over the years. In this paper, we present the evaluation results, of a couple of well-known benchmark suites, using standard OpenMPI implementation compared with the (vendor-provided) NumaConnect specific BTL (NumaConnect byte transfer layer); on a modern shared-memory multi-GPU-based Numascale cluster system. We run a series of standard benchmarks kernels and pseudo applications that are widely used to benchmark clusters and supercomputers. In particular, we present our results from running two benchmarks, one each from the OSU and the NPB benchmark suites(iBarrier, Conjugate Gradient) as representative of our benchmarking effort with the Numascale machine. Our results show an order of magnitude performance improvements on communication and synchronization costs on standard benchmarks while using NumaConnect specific NC BTL.

Open Access: Yes

Read

Reusing Resource Coalitions for Efficient Scheduling on the Intercloud

Authors

T. Selea (IeAT), A. Spataru (IeAT), M. Frincu (IeAT)

Abstract

The envisioned intercloud bridging numerous cloud providers offering clients the ability to run their applications on specific configurations unavailable to single clouds poses challenges with respect to selecting the appropriate resources for deploying VMs. Reasons include the large distributed scale and VM performance fluctuations. Reusing previously “successful” resource coalitions may be an alternative to a brute force search employed by many existing scheduling algorithms. The reason for reusing resources is motivated by an implicit trust in previous successful executions that have not experienced VM performance fluctuations described in many research papers on cloud performance. Furthermore, the data deluge coming from services monitoring the load and availability of resources forces a shift in traditional centralized and decentralized resource management by emphasizing the need for edge computing. In this way only meta data is sent to the resource management system for resource matchmaking. In this paper we propose a bottom-up monitoring architecture and a proof-of-concept platform for scheduling applications based on resource coalition reuse. We consider static coalitions and neglect any interference from other coalitions by considering only the historical behavior of a particular coalition and not the overall state of the system in the past and now. We test our prototype on real traces by comparing with a random approach and discuss the results by outlying its benefits as well as some future work on run time coalition adaptation and global influences.

Open Access: No

Read

On Autonomic HPC Clouds

Authors

D. Petcu (IeAT)

Abstract

The long tail of science using HPC facilities is looking nowadays to instant available HPC Clouds as a viable alternative to the long waiting queues of supercomputing centers. While the name of HPC Cloud is suggesting a Cloud service, the current HPC-as-a-Service is mainly an offer of bar metal, better named cluster-on-demand. The elasticity and virtualization benefits of the Clouds are not exploited by HPC-as-a-Service. In this paper we discuss how the HPC Cloud offer can be improved from a particular point of view, of automation. After a reminder of the characteristics of the Autonomic Cloud, we project the requirements and expectations to what we name Autonomic HPC Clouds. Finally, we point towards the expected results of the latest research and development activities related to the topics that were identified.

Open Access: Gold

Read

On the Next Generations of Infrastructure-as-a-Services

Authors

D. Petcu (IeAT), M. Fazio, R. Prodan, Z. Zhao, M. Rak

Abstract

Following the wide adoption by industry of the cloud computing technologies, we can talk about a second generation of cloud services and products that are currently under design phase. However, it is not yet clear how the third generation of cloud products and services of the next decade will look like, especially at the delivery level of Infrastructure-as-a-Service. In order to answer at least partially to such a challenging question, we initiated a literature overview and two surveys involving the members of a cluster of European research and innovation actions. The results are interpreted in this paper and a set of topics of interest for the third generation are identified.

Open Access: Gold

Read

CLOUDLIGHTNING: A Framework for a Self-organising and Self-managing Heterogeneous Cloud

Authors

Lynn, T. (DCU), Xiong, H. (UCC), Elster, A. (NTNU), McGrath, M. (Intel), Khan, M. (NTNU), Kenny, D. (DCU), Becker, T. (Maxeler), Giannoutakis, K. (CERTH), Filelis-Papadopoulos, C. (DUTH), Dong, D. (UCC), Gravvanis, G. (DUTH), Gaydadjiev, G. (Maxeler), Tzovaras, D. (CERTH), Kuppuudaiyar, P. (Intel), Neagul, M. (IeAT), Momani, B. (UCC), Natarajan, S. (Intel), Petcu, D. (IeAT), Gourinovitch, A. (DCU), Dragan, I. (IeAT) and Morrison, J. (UCC)

Abstract

As clouds increase in size and as machines of different types are added to the infrastructure in order to maximize performance and power efficiency, heterogeneous clouds are being created. However, exploiting different architectures poses significant challenges. To efficiently access heterogeneous resources and, at the same time, to exploit these resources to reduce application development effort, to make optimisations easier and to simplify service deployment, requires a re-evaluation of our approach to service delivery. We propose a novel cloud management and delivery architecture based on the principles of self-organisation and self-management that shifts the deployment and optimisation effort from the consumer to the software stack running on the cloud infrastructure. Our goal is to address inefficient use of resources and consequently to deliver savings to the cloud provider and consumer in terms of reduced power consumption and improved service delivery, with hyperscale systems particularly in mind. The framework is general but also endeavours to enable cloud services for high performance computing. Infrastructure-as-a-Service provision is the primary use case, however, we posit that genomics, oil and gas exploration, and ray tracing are three downstream use cases that will benefit from the proposed architecture.

Open Access: Yes

Read