TurboTalk
Management in the Age of Virtualization
This post continues part I to consider two additional sources of potential IO bottlenecks in virtualized environments: randomization of access and microbursting.
RANDOMIZATION OF STORAGE ACCESS
Storage performance can vary by orders of magnitude between sequential and random access. Sequential access rates are bound by transfer rates. For example, a storage supporting 500MBps transfer rate can handle sequential stream of 8KB records at some 500,000/8~62500 IOps (I/O operations per sec). In contrast, random access rates are bound by the average seek time. For example, a storage with average seek time of 5ms can handle only 1/0.005=200 of purely random IOps- a minor 0.3% of the sequential access rates above.
Databases and file systems have thus been designed to optimize access rates through sequential organization of stored data. Storage arrays, likewise, incorporate sophisticated scheduling mechanisms to minimize the penalties of random access and optimize sequential access. In particular, I/O operations are queued and scheduled to minimize access time.
Figure 2 of part I, repeated below, depicts a virtualized storage I/O pipe. A central function of the virtualized pipe is to consolidate I/O workloads. The consolidated flow interleaves the I/O operations of different VMs. Thus, even if each VM generates a stream of perfectly sequential access requests, the consolidated stream may require the storage system to handle purely random access. 
Figure 3: Interleaving of VM I/O Streams
To illustrate the effect of interleaving, consider an idealized worst case scenario of 8 VMs, as in the figure. Each VM generates a perfectly sequential stream of I/O operations. These I/O operations are perfectly interleaved by the virtualized I/O pipes to target the same spindle. The storage system will see these interleaved requests as pure random accesses. This will penalize these I/O operations with both, random access delays as well as queueing delays by interfering streams.
More generally, I/O workload consolidation can randomize sequential storage accesses by interleaving them. The degree of randomization depends on a large number of factors, ranging from the statistics of I/O workloads of VMs, to the queueing and scheduling mechanisms of the storage array.
One can reduce interleaving effects by carefully separating competing I/O workloads to target different spindles. This requires careful tuning and allocation of I/O traffic among different VMFS, LUNs and hypervisors.
Alternatively, one can eliminate the impact of randomization by exploiting emerging Enterprise Flash Drive (EFD) storage systems. Flash storage can reduce seek time to sub-ms range, e.g., 0.1ms. At 0.1ms average seek time, the rate of random accesses to storage is 1/0.0001=10,000 IOps, which is commensurable with IOps rates of purely sequential access. Indeed, performance experiments With EFD storage arrays have been reported to sustain over 350,000 random access IOps by an ESX server.
MICROBURSTING
A multi Gbps I/O link can generate large bursts of traffic during short durations. For example, an 8Gbps link can transmit 125,000 IOps of 8KB. A microburst of 4 ms, over this link, may generate some 500 I/O operations. Such microbursts may exceed the buffer capacity along the virtualized I/O pipe, resulting in buffer overflows, losses and increased latency.
Put differently, an I/O pipe of 8Gbps, with 10ms end-2-end latency, may need to store a (bandwidth)x(delay) product of 1250 I/O operations in its buffers. Furthermore, these I/O operations may not be distributed uniformly through the buffers, but concentrate at some bottleneck links. Microbursts may saturate these bottleneck queues resulting in losses.
Microbursting has been known to disrupt traffic in TCP/IP networks (e.g., see microbursting impact on financial networks). Advanced routers deploy traffic shapers to detect and manage microbursting by spreading bursts. Detecting microbursting may be challenging, as standard tools typically monitor averages over time periods much longer than a burst size and may miss the bursts.
A recent article by Chad Sakac, provides an excellent analysis of microbursting behaviors of storage I/O in virtualization systems. An interesting question is which buffers, along the I/O pipe, absorb the microbursts and overflow: the array, fabric or hypervisor queues? The answer, of course, depends on the specific configuration and buffer sizes scenarios. A subsequent article reports measurements of the hypervisor’s LUN queues overflows; for the scenario considered these overflows were sufficiently rare to be negligible.
Practically speaking, administrators must protect high-speed virtualized I/O pipes against potential microbursting. In particular, they need to configure buffers along the pipe, detect microbursts and the buffers they saturate, and shift VMs and I/O traffic to reduce the pressure on these buffers.
CONCLUSIONS
Virtualization of I/O pipes can give rise to complex potential bottlenecks through interference among consolidated I/O workloads. Interference arises in several forms: (a) competition among traffic streams over shared resources along the I/O pipe; (b) randomization of interleaved sequential access; and (c) condensation of traffic into microbursts.
Emerging NPIV technologies may ease traffic interference, by extending FC protocols to support end-2-end flow control between guest OS’s and storage arrays. This will allow flow control and traffic management mechanisms of FC to regulate and reduce traffic interference. Emerging EFD storage technologies accelerate random access and can thus resolve the randomization of consolidated I/O workload. Managing microbursting may become important as higher bandwidth I/O infrastructures are deployed. This may require bandwidth management technologies analogous to those used in high-speed TCP/IP networks.
Regardless of these advances, virtualization system administrators are likely to remain tasked with I/O performance management. This presents complex challenges, not the least of which is coordinating management of I/O intensive applications and traffic among virtualization administrators, storage administrators and applications administrators.
Category: Performance