November 19, 2009 By vmturbo

Can Theory Shed Light on Workload Consolidation? (I)

This article is the first in a series devoted to the queueing-theoretic foundations of workload consolidation and its applications to virtualization.

Consider 3 physical machines, each dedicated to process the workload of a respective application, as depicted in figure 1 below. The workload streams arrive in bursts of processing requests, depicted by the rectangles on the arrival time-lines. The rectangles lengths depict the durations of bursts, while their heights depict the amount of processing  to service these bursts. These workload streams are queued until the service processors, depicted by ellipses, are available to process their requests. The service processors may be providing CPU, memory, storage I/O bandwidth, or network bandwidth.

dedicated

Figure 1:  Dedicated Workload Processing

Suppose the three applications are consolidated into virtual machines (VMs) executing on a single host.  Consolidation can generally involve two independent actions: (a) merging  the three workloads into a single stream; and (b) pooling the three service processors  into a single processor.  For example, the service processors may be 3 separate Network Interface Cards (NICs) which are pooled, through a virtual switch, into a single service processor handling the aggregate network traffic of the 3 applications. This aggregation of workloads and processors is depicted in the figure below.

ConsolidatedQ

Figure 2:  Consolidated Workload Processing

Consolidation permits workloads to share the full service processing capacity and thus create statistical multiplexing gains.  For example, the “yellow” workload of figure 1 is queued, waiting for its dedicated processor. At the same time, the processor dedicated to the orange workload  is idle. Dedicated processing does not permit the yellow workload to share the idle capacity dedicated to the orange workload. In contrast, the consolidated service of figure 2 permits any  processor to be used by any workload. The yellow workload can tap the aggregate processing power of all processors and accelerate its processing.

How much multiplexing gain can consolidation provide?

In general, let T denote the average delay of dedicated processing streams and let T* denote the average delay of their consolidated stream. The multiplexing gain in delay is defined as G=T/T*. How large can the delay gain G be?

The precise answer clearly depends on the statistical details of the system. Consider first an extreme scenario, depicted in figure 3 below. The yellow and green streams generate clustered bursts that produce long queues and delays. The orange stream is using its processor very lightly. The consolidated workload  streamlines traffic evenly and uses the collective power of the 3 processors to eliminate queues and delays; thus T*=0. The multiplexing gain G=T/T*=T/0 is infinite.

3muxgains,jpgFigure 3:  Multiplexing Gains Reduce Queueing Delays

Figure 4,  in contrast, depicts a scenario with no multiplexing gains. The 3  streams consist of fully synchronized burst arrivals; all three processors are either busy, or idle, at the same time.  The delay T* experienced by the consolidated stream is the same as the delay T of dedicated streams . The multiplexing gain in delay is G=T/T*=1. That is, there is no multiplexing gain.

4nomuxgains

Figure 4:  No Multiplexing Gains

More generally, multiplexing gains depend on the specific statistics of the workload. The example of figure 3 accomplished maximal gain because workload bursts were uncorrelated in time  and among the individual streams. The example of figure 4 produced no gains because workload bursts of individual streams were perfectly correlated.

Intuitively speaking, multiplexing gains obtain when the workloads of  the multiplexed streams are minimally correlated.

In what follows we will see that, when the workload statistics indeed minimizes correlations in time and among streams, the multiplexing gain can be proportional to the consolidation ratio.

It  is useful to first illustrate some of the practical implications of this rule.

Example 1: Consolidation of Network Traffic

Network traffic of VMs is typically aggregated by a virtual Switch (vSwitch).  The vSwitch schedules the merged traffic streams into respective physical NICs. Network traffic shares the aggregate capacity of the NICs.  Network traffic streams are typically minimally correlated. Therefore, aggregation of NICs and traffic typically yields significant multiplexing gains, over dedicated NICs.

Example 2: Load Balancing

Load balancing is  essentially a consolidation mechanism. The  load balancer creates a pool of the processing resources and schedules their allocation  to the aggregate stream. The load balancer incurs multiplexing gains to the extent that the workload streams are minimally correlated.

The size of the multiplexing gains can be proportional to the amount of resources pooled by the load balancer. For example,  the Distributed Resource Scheduler (DRS) of VMware,  consolidates the resources of a cluster and can thus  yield substantially higher multiplexing gains than consolidation of workloads into a single physical host. By the same token, a load balancer that pools the resources of an entire data-center, or a cloud, can yield substantially higher gains than balancing loads over  a single cluster.

Example 3: Database Applications Services

Consider a virtualized database server consolidated with its applications to share the same physical host. Suppose the applications read/write files based on data manipulations by the database servers. Thus, for example, the database may retrieve a large table into memory and then provide an application with large share of this data, which it uses to update large files it retrieves from storage.

These computational activities can create high degree of correlations among the respective workloads. For example, the application may generate a burst demand for memory in order to store the data provided by the server, just when the database server produces a burst demand for memory to retrieve sections of the table. Similarly, the application will generate a burst demand for storage IO, to read its large files, just when the database server is producing a burst demand for IO to retrieve the tables.

These correlations can reduce, even eliminate, multiplexing gains. Worse, as we will discuss in subsequent posts, the correlations may result in increasing the workload peak bursts saturating the respective resources.

Clearly, to avoid such contention for shared resources one needs to best assure that correlations are  minimized. For example, one may relocate database applications involving high-degree of I/O into a separate physical host.


We continue this discussion in Part II, where we provide quantitative analysis of these multiplexing gains and consider their additional applications.

Reblog this post [with Zemanta]

Category: Theory

Posted on November 19, 2009 | Permalink | View Comments Subscribe
blog comments powered by Disqus