January 6, 2010 By vmturbo

Can Theory Shed Light On Consolidation?(II)

(This is the second in the series of two articles about workload consolidation in virtual machine environments.  The first article is linked here.)

The Bottom Line

There are several practical applications one can draw from various mathematical and computer science theories related to capacity and workload management of virtual machine environments.

  1. Merging workloads can yield significant acceleration of processing during periods of high utilization.
  2. Aggregating processing capacity can accelerate processing at all traffic levels.
  3. Alternatively, consolidating workloads and processing capacity can enable higher utilization of resources while keeping processing delays within target QoS bounds.
  4. These performance gains are higher, the higher the consolidation ratio. Consolidation of workloads across  a cluster yields higher gains than consolidation of a single host; and consolidation across a data center, or a cloud, yields higher gains than a single cluster. It is thus best to consider consolidation as an expansive hierarchy of resource sharing schedulers.
  5. These performance gains require that the consolidated workloads be maximally uncorrelated.

Merging Workloads and Aggregating Processing

Consider  the consolidation modes depicted in figure 1 below. Part (a) shows  k workload streams processed by dedicated processors; the heavy yellow workloads is queued for its service processor S1, even when the green and blue processors (S1, Sk) are not used by their respective lighter workloads. Part (b) shows workload merging into a single stream, sharing the pool of service processors.  An arriving processing task may now access any of the processors. This  eliminates scenarios   where tasks are queued while some processors are idle, as in dedicated processing.   pix2.1

Figure 1:  Consolidation Modes

Part  (c) shows capacity aggregation of service processors  into a single processor with the aggregate capacity.  An arriving task can utilize the entire aggregate capacity of all the service processors to accelerate its processing.

Example: Consolidation of Network Traffic

To illustrate the two consolidation modes, workload merging (1b) and capacity aggregation (1c), consider the network workloads of k=10 VMs. The service processors S1,S2…Sn…S10 represent 1GE NIC at the host. The workloads represent packet streams generated by the VMs sharing the hosts.  Workloads merging, depicted in (1b), is provided by virtual switch (vSwitch) that schedules packets, of all streams,  to the  10 NICs . This consolidation will produce multiplexing gains by permitting the streams to share the pool of processors and thus use their capacity more efficiently. The capacity aggregation, depicted in (1c), replaces the 10 1GE NICs with a single 10GE NIC. This consolidation will result in additional multiplexing gain as packets processing times are significantly reduced by applying the full aggregate processing capacity.

Consolidation Creates Multiplexing Gains

Merging workloads, as in (1b),  clearly uses the service processors more efficiently than dedicating processors to individual streams, as in (1a). Capacity aggregation provides additional efficiency; an arriving task can use the full aggregate capacity of the k processors, not just one of them.

These multiplexing gains can be measured by the respective reductions in delays, as defined in part I:

GM =Tdedicated/Tmerged

where Tdedicated is the average processing delay  of the dedicated streams of (1a), while  Tmerged is the average delay of the workload merging consolidation(1b).  Similarly

GA =Tdedicated/Taggregated

where Taggregated is the average delay of the capacity aggregation consolidation (1c).

What are the  multiplexing gains GM and GA of the two consolidation modes?

We will now use elementary queueing theory to address this question and gain further insights into consolidation modes.

Brief Intro to Queueing Analysis of Workload Processing

Queueing theory describes a processing service in terms of three parameters: “A/B/k”  where A describes the statistical distribution of the jobs inter-arrival times (workload); B described the distribution of service processing times; and k describes the number of servers. Additional assumptions may describe statistical independence of arrivals and service, the queueing discipline  (e.g., FIFO, LIFO ) and various service features.

For example, the  M/M/1 queueing service model describes a single server system handling a workload stream with statistically independent exponentially distributed inter-arrival times and service times. The letter “M”  indicates the memorylessness  of the exponential distribution: the future is independent of its past. M/M/k models provide a standard base to explore the performance of a workload processing services.

The exponential distribution is characterized by its rate. We use  L (load) to indicate the arrival rate and C (capacity) to indicate the service rate. The utilization of the service is defined as U=L/C,  the fraction of capacity used by arriving workload. We note in passing that when one merges k streams with exponentially distributed arrivals at rate L, the merged stream also has exponentially distributed arrivals at rate kL.

The performance of the three models, depicted in figure 1, may be analyzed using M/M/k  analysis. Assume that the workload arrivals  are  exponentially distributed with rate L; while service times are exponentially distributed with rate C. Figure (1a) depicts k independent M/M/1 systems. Figure (1b) depicts a merging of these k systems into a single M/M/k system, with arrival rate of kL. Figure (1c) depicts an M/M/1 system with aggregate arrival rate of kL and aggregate service rate of kC.  Note that the utilization U=L/C=kL/kC remains similar in all models. However, processing delays will vary greatly.

Queueing Analysis of  The Multiplexing Gains

We are now ready to analyze the multiplexing gains of the workload merging and capacity aggregation consolidation modes, depicted in figure 1.

The average delay of an M/M/1 stream, as in (1a),  is given by Tdedicated=1/(C-L).

The average delay of an M/M/k stream, as in (1b), is given by

Tmerged=(1/C)+P(k,U)/(kC-L)    where P(k,U) is the probability of queueing

For U~0 (low utilization) P(k,U)~0 and thus T0merged~1/C ; that is, the waiting time is approximately the service time. For U~1 (high utilization) P(k,U)~1 and thus T1merged~1/(kC-L).

Therefore, when U~0 (light load): Gmerged=Tdedicated/T0merged= C/(C-L)=1/(1-U) ~1;  there is no gain in merging workloads to  share processors. However, when U~1 (heavy load):  Gmerged=Tdedicated/T1merged=(kC-L)/(C-L)=(k-U)/(1-U)~k; the multiplexing gain is proportional to the consolidation ratio k.

To summarize, Gmerged~1 for light utilization, while for high utilization  Gmerged~k .

For example, if  network traffic of 10 VMs is merged by a vSwitch into 10 NICs, during heavy traffic the multiplexing gains will converge to the consolidation ratio 10. When traffic is light, gains will be minimal.

Consider now the aggregated stream of figure (1c). The average delay  is given by Taggregated=1/(kC-kL)=(1/k)Tdedicated

Therefore GA =Tdedicated/Taggregated =k

For example, suppose network traffic of 10 VMs is aggregated into a 10GE NIC . The traffic processing delays will be reduced by a factor of 10 over using 10 dedicated 1GE NICs.

Now compare workload merging with capacity aggregation modes. Under light traffic  GM ~1, that is, merging produces no gains over dedicated resources. In contrast, GA =k and thus, even under light traffic, processing is greatly accelerated. However, for high utilization GM ~GA ~k. That is, both merging provides similar gains as capacity aggregation.

Finally, one should keep in mind that theory is, at best, a valuable approximation of reality. The M/M/k analysis above is based on assumptions that may, or may not, be valid for specific real scenarios. For example, workloads may be correlated, violating the assumptions of M/M/k models. Therefore, it is best to view the theory as a quantitative underpinning of the qualitative performance behaviors and focus on these.

In summary consolidation of k  M/M/1 systems can reduces processing delays through periods of high-utilization,  by a factor similar to the consolidation ratio.

Applications

We now apply the theoretical considerations above to several resource sharing scenarios in virtualization systems.

Memory Sharing

Memory pages may be considered as processors of workloads demands. If these pages are partitioned among VMs — i.e., through reservations — the resulting dedicated processing of memory requests is described by figure (1a).  Such memory reservations can result in great inefficiencies as some VMs queue their memory pages in swap areas, while other VMs are under-utilizing their memory shares. This can have dramatic impact on performance. Therefore, it is desirable to permit sharing of unused memory. Such sharing can be handled by allocating the pool of unused memory according to “shares” of respective VMs.

If   k is the number of pages in the shared pool, the M/M/k model of figure (1b) provides a useful  approximation of this merging of memory requests streams. When utilization is high, one can expect the multiplexing gains in access delays to improve by a factor of k, over dedicated reservations. Unfortunately, when utilization increases the size of the shared pool tend to decrease and the gains in access speeds may be offset by swapping overheads. Therefore, it is useful to max the size of the shared pool to rip the benefits of consolidation. The ballooning mechanisms of VMWare are an example how the virtualization system can “steal” unused dedicated memory to increase the shared pool.

Storage IO  Sharing in SANs

Storage IO streams share the processing capacity of an HBA and SAN fabric behind it. Storage protocols partition the HBA capacity among IO channels to the storage array. Such partition of IO processing exhibits the performance of  dedicated streams as in figure (1a).

Virtualization systems, however, often merge the IO streams of multiple VMs into a single IO channel. If all VMs IO traffic is consolidated into a single IO channel, the resulting stream can exhibit the performance gains of the capacity aggregation mode of  figure (1c).  On the surface such consolidation would max the multiplexing gains.  Unfortunately, consolidation of storage IO can lead to inconsistencies with the storage array mechanisms. For example, merging of storage IO operations streams may disrupt sequential access, leading to significant slow down of storage access.

Therefore, consolidation should not be considered panacea. The benefits of multiplexing gains may be offset by limitations of other mechanisms. In the case of storage IO, the root obstacle to consolidation is merging IO streams into a single IO channel. The semantic of a “channel” expected by a storage array is inconsistent with merged streams.  The problem can be solved by maintaining separate channel identifier for each stream (virtual channel), yet sharing the underlying physical processing capacity among them. This permits the storage array to optimize the processing of each virtual channel, while benefiting from the multiplexing gains in using underlying processing resources.

Maxing Utilization vs. Delay

The discussion above focused on multiplexing gains in reducing delays. More often, one is interested in using consolidation to max utilization while keeping delays within a given target range.

This, however, can be resolved by a simple twist of the analysis above. Consider for example the capacity aggregation of figure (1c). Suppose one merges the k workload streams but aggregates the capacity of only m<k processors. The utilization of the consolidated system is U*=kL/mC=(k/m)U. Thus, the utilization gain is U*/U=k/m.

The average delay contracts from T=1/(C-L) to T*=1/(mC-kL). So G=T/T*=(mC-kL)/(C-L)=(m-kU)/(1-U).  Suppose one wishes to double the average utilization by using m=k/2. Then G=k(1/2-U)/(1-U). When U is sufficiently small, G~k/2; so utilization can be doubled, while keeping significant multiplexing gains proportional to half the consolidation ratio.

Reblog this post [with Zemanta]

Category: Theory

Posted on January 6, 2010 | Permalink | View Comments Subscribe
blog comments powered by Disqus