October 22, 2009 By vmturbo

Where in the world is the virtual I/O bottleneck? (I)

This two parts article considers storage IO bottlenecks in virtualization systems.

Background: Storage IO flows

Blog2Fig1

Figure 1:  Physical IO Pipes

Figure 1, above, depicts the storage IO pipe through vanilla physical infrastructure. IO operations flow from the source OS drivers, on the left, through the Host Bus Adapter (HBA) and the SAN fabric, to the Host Interface Card (HIC) of the storage array, on the right, where they are delivered to the target Storage Processor (SP) where they are processed. The HBA, Fabric and HIC use Fiber Channel (FC) protocols to assure reliable and efficient delivery of IO operations.

IO traffic through the pipe traverses a large number of processing elements where it competes with traffic of other pipes and is buffered until processed. This can give rise to congestion conditions, where buffers overflow and drop IO frames. The FC protocols detect these losses and retransmit the frames. This results in reduced thruput and increased latency, which impairs the respective applications.

The FC protocols thus incorporate careful flow control mechanisms to avoid buffer overflows by limiting traffic along both, hop-by-hop links as well as end-to-end connections. These mechanisms control traffic levels to minimize interference among competing workloads of different channels, assure buffer availability along the pipes and enable administrators to balance IO workloads through the fabric and arrays.

Virtualization is a game changer.

Consider a generic scenario of IO flows through virtualization infrastructures, depicted in figure 2 below.

blog2fig2

Figure 2:  Virtualized IO Pipes

The  virtualized IO pipes from VMs to the array are distinct from those of physical pipes, depicted in figure 1, in two ways:

(a)   They traverse additional  hypervisor “IO links”  and queues between the vHBA and HBA; and

(b)  They share common channels between VMFS,  HBA and LUNs

These, seemingly innocuous, distinctions have significant impact:

(1)  IO flow control by FC does not extend to the hypervisor’s “IO links” between the vHBA and HBA; these link-level flow control and end-2-end flow control between the vHBA and array are shifted from automated, adaptive, coordinated channel protocols to hypervisor management by virtualization administrators, and  coordination with  storage and applications administrators

(2)  IO flows of a given VM lose the protections of channel flow-control mechanisms and may be disrupted by IO flows of other VMs sharing their HBA, channel and LUN.

(3)  Storage array performance may too be disrupted through randomization of access by interfering IO flows

(4)  Elusive bottlenecks may emerge, due to short bursts (microbursting), presenting challenging detection, isolation and handling problems

In what follows we consider the first two factors of distinction in details, leaving the last two for the second part of this article.

The Hypervisor Shifts Protocol Functions To Management Responsibilities

Consider first the role of the hypervisor in handling IO flows. The hypervisor “IO links” extend the channels of the HBA with new processing  and buffers.  Traffic along these links cannot be flow-controlled by the channel protocols. The hypervisor thus requires flow-control mechanisms to prevent buffer overflows of its links, as well as end-2-end links. VMware, for example, sets strict limits on the number of IO operations that can be buffered at the hypervisor (typically 32) and requires respective configurations of the VMs (see this article or this one about storage queues and performance).

This converts flow control functions from automated, adaptive  infrastructure protocols to a management function to be handled by virtualization administrators. Furthermore, channel flow control protocols provide end-to-end adaptive traffic control, coordinating flows along intermediate links to avoid bottlenecks. The hypervisor links do not extend this end-to-end control, increasing the possibilities of uncoordinated flows and bottlenecks formation.

Virtualization administrators are thus required to monitor IO traffic flows to detect, analyze and handle disruptions and coordinate these with storage and applications administrators.  Now, VMWare provides rich instrumentation to support this monitoring (see this article about storage analysis and monitoring and this article about vscsi stats). However,  the tasks of monitoring this data, analyzing it, detecting IO disruptions and resolving them can be very challenging and require intimate understanding of storage IO flows and the underlying infrastructure’s operations.

Could one restore flow control over the hypervisor’s IO links to automated protocols?  Recent extensions of the FC standards (discussed below) permit overlays of virtual FC between VMs and the array. These extensions permit channel protocols to protect end-2-end flows from the vHBA to the LUN. Indeed, vSphere supports such channels. However, this requires use of raw storage access provided by RDM. In turn, one cannot use the storage semantics and rich services of VMFS.

Are there other alternatives to restore flow control to automated mechanisms, while preserving rich hypervisor services as provided by VMFS?  This question will be considered in future blogs.

We now turn to the second and more challenging problem of virtualized IO pipes.

IO Flows Can Disrupt Each Other

Consider the IO flows of multiple VMs sharing VMFS and  HBA depicted in figure 2.  Suppose these VMs access different targets and retrieve large amounts of data to be processed by them. The storage arrays may inject these independent IO flows into the fabric over several ports and storage processors. The aggregate thruput of these flows may far exceed the capacity of the fabric port attached to the HBA. This will result in buffer overflows and loss, triggering retransmissions and increased latency.

A physical infrastructure, as depicted in figure 1, avoids such problems by dedicating physical capacity and carefully tuning IO  workloads to this capacity. In contrast, the application administrators of VMs cannot be aware of the IO workloads of other VMs, sharing the physical capacity with them.  For example, a database application may retrieve large tables to compute their join, while a security application may scan a VM storage for viruses. Interference presents a complex challenge when multiple IO-intensive applications share an HBA.

Why is interference in sharing an HBA harder to handle than for CPU sharing? CPU resources are carefully scheduled by automated hypervisor mechanisms, adapt to instantaneous traffic demands and provide guaranteed allocations.  In contrast, HBA resources are scheduled through loose mechanisms managed by administrators, do not adapt to instantaneous traffic and do not provide guaranteed allocations.

Interference and disruptions can emerge through competitive sharing of memory resources, not just HBA. Consider a guest database server requiring physical memory to process large tables. The hypervisor may use ballooning to reclaim physical memory from other VMs and expand its physical memory pool. Now suppose the VMs releasing this physical memory require it back. The memory available to the database server will decline. The hypervisor may swap least-recently-used (LRU) pages of the database server to its swap area. The guest OS of the database server may, too, use an LRU algorithm to swap the same pages to its own swap area. This requires the hypervisor to swap the pages back to physical memory where they may be copied by the guest OS to its swap area. Such interleaved swapping and ballooning can significantly disrupt multiple VMs. The database server, in particular, may be unable to handle the bursts of IO  flows delivering the large tables.

One could, of course, pursue several measures to limit interference. For example, reduce interference over VMFS by dedicating VMFS to IO-intensive applications (this however may create interference through competition of VMFS over memory resources). Similarly, one may  limit IO thruput not to exceed an aggregate utilization of 30% of the HBA capacity. However, IO traffic is very bursty; even if traffic averages meet such pre-set limits, one cannot ignore the disruptive effects of bursts.

Needless to say, one can reduce interference by limiting consolidation ratios for IO-intensive applications. Alternatively, one can over-engineer the IO pathways and memory to minimize interference.  However, both approaches put in question the very reason to virtualize IO-intensive applications. Another alternative, usually pursued by administrators in virtualizing IO-intensive applications, is to consolidate such applications with workloads involving low IO demands, e.g., consolidate a database server with print servers and web servers.

Recent efforts by the T11 standards committee (the NPIV protocol) provide a promising alternative in enabling FC channels to be virtualized and extended from the vHBA to the LUN. This permits FC protocols to allocate end-to-end resources to these virtualized channels, control flows and minimize interference.  These mechanisms can dramatically simplify both the interference and flow control problems. However, there are two limiting factors in using them. First, one has to use RDM to support such virtualized FC channels and abandon the rich  services offered by VMFS. Second, in an environment where VMs can move, the resources allocated to virtualized channels will need to be adapted dynamically to handle redistribution of the IO workloads; this requires challenging management automation tools.

In conclusion, virtualization of IO flows, while seemingly involving trivial changes from physical infrastructures, introduces significant new complexities and potential disruptions of IO flows. Part II considers some additional such challenges of IO virtualization and possible directions to resolve them.

Reblog this post [with Zemanta]

Category: Performance

Posted on October 22, 2009 | Permalink | View Comments Subscribe
blog comments powered by Disqus