John Gannon

June 30, 2010 By John Gannon

VMware CPU ready, virtual machine rightsizing…and donuts?

Newest Donut In Town
Image by uncleboatshoes via Flickr

Anyone looking for a virtualization moment of zen today needed to look no farther than Cody Bunch of ProfessionalVMware.com, when he opined on Twitter:

%RDY %RDY %RDY – It’s like having customers waiting for donuts. When you have more customers than donuts, you have a problem. #donutzen

This is a great way to describe the CPU co-scheduling problems that can crop up when using VMware, and based on what we have seen in the field, these problems are quite common.  The co-scheduling problem arises when ESX Servers need to schedule multiple processors to service a virtual symmetric multiprocessor (vSMP). In order to emulate the semantics of an SMP, these processors  must be co-scheduled concurrently to service the vSMP.

So, if your SQL Server VM requires a vSMP with 4 vCPUs, then it will need to grab 4 physical processor cores in order to execute.  The ESX co-scheduling mechanisms will first try to run it, even if it does not have 4 available vCPUs. However, as soon as it hits an event requiring all 4 vCPUs, it will place the vSMP into the CPU ready queue until 4 cores become available to service all 4 vCPUs.   It is possible that the vSMP will remain waiting in the ready queue for a long time.  Other VMs requiring only 1 vCPU may grab cores as soon as they become available, starving the vSMP VM.

Of course we can think about this in terms of Cody’s donuts. Suppose the donut store keeps you waiting for an order of 4 donuts, until it services all customers requiring only 1 donut.  During the morning rush hour when the stream of customers requiring 1 donut seems never ending, you will wait forever until your order of 4 donuts may be satisfied.

OK, OK, I think you get the point…so let me continue…
This problem can be diagnosed by examining the %rdy values on the ESX Server, but solving it is another matter entirely. If you are willing to sacrifice all of your vSMP virtual machines then you could make this problem disappear instantly. However, many mission critical applications require the performance benefits of SMP architectures. Forcing them to avoid virtual infrastructures would significantly limit the value of virtualization.

Worse. Often, as traffic demands increase, one would like to allocate more resources to their vSMP virtual machines. Consider an application using that is running on a 2 vCPU VM. Suppose one wishes to accelerate the processing speed of peak traffic by doubling the allocation of vCPUs to 4.  What is one to do if instead of improving processing speeds, they witness a dramatic decline? Such decline is due to increasing waiting time in the CPU ready queue; grabbing 4 vCPUs may take substantially longer than getting 2 vCPUs.

Most customers we speak with are running some fairly beefy virtual machines that require more CPU horsepower that a single vCPU virtual machine can provide. So, they try to strike a balance and periodically examine vSMP virtual machines to see if the %rdy values on a given ESX Server are high. If these values are high, they usually try to VMotion the VMs to different hosts to address the problem.

Unfortunately, solving this issue is a real time problem. CPU ready will fluctuate as demand on the virtual infrastructure changes, and as demands on the applications running on that infrastructure change. Taking a point-in-time snapshot of the environment may solve the problem right now, but it won’t cure it for good.

Now is probably a good time to mention that one of the most popular features in VMTurbo’s virtual appliance is real-time virtual machine rightsizing. Our virtual appliance ensures that all of your virtual machines are running at the right size, at the right time, even as demands on your applications and infrastructure change. If you’d like to rightsize your environment, let us know and we’ll give you a link to download our virtual appliance. Download and be rightsized in just minutes!

Enhanced by Zemanta

Category: Performance

Posted on June 30, 2010 | Permalink | View Comments Subscribe
  • Josh,
    I'm pro-#DonutZen, but I need to correct you on how ESX handles scheduling of multi-proc VMs. What you are describing is "strict" co-scheduling. ESX has not done that since ESX 2.5. In ESX 3, they introduced relaxed co-scheduling to allow a portion of the vCPUs of a multi-vCPU VM to run a the same time. ESX 4, relaxed this even further with better skew detection and handling. The following VMware documentation explains the ESX scheduler very well within 20 pages or so. http://www.vmware.com/files/pdf/perf-vsphere-cpu_scheduler.pdf

    I agree though in principle that large VMs need to be justified and not doled out like candy.

    Thanks for bringing prominence to donuts.

    Sean Clark
    #DonutZenMaster



  • Sean- thanks for the comment.

    What we've seen in the field pretty consistently is that even customers running more recent versions of ESX still have many VMs w/RDY%, even with the relaxed co-scheduling. The relaxed co-scheduling will help the situation, but there are still many folks in the wild who are struggling with the performance and VM sizing implications...at least our neck of the woods and the folks we have been speaking with. Of course your mileage may vary!

    Time to make the donuts.... ;)

    Thanks again for contributing.

    -John
  • John,
    Mileage will definitely vary, that is an eternal truth when it comes to performance and capacity planning.

    Just so readers aren't confused (since it can be a confusing subject for folks not chest-deep in VMware), but there is a good treatment of Ready Time here: http://communities.vmware.com/docs/DOC-7390 - Something that gets confused is interpreting the %RDY stat for multi-proc VMs. In the case of a multi-proc VM there is actually a %RDY calc for each "world" or vCPU (+overhead) of the VM and each of these is tallied to create the default %RDY stat you see when you first open esxtop. If folks don't know this is an aggregate stat, they might unecessarily jump to the conclusion that the VM is performing badly, when in reality each vCPUs %RDY time might be within limits layed out in the link above. To see this granularity of %RDY per world, you can type "e" within esxtop's CPU screen to expand a VM to show all its worlds.

    John, I'm probably preaching to the choir here. But I just want to make sure that your readers have some good additional tools/resources on their journey to understanding %RDY and how to manage performance of their larger VMs.

    -Sean Clark
    @vSeanClark
  • Sean - just wanted to let you know we updated the post with some additional content and requisite donut picture :) Thanks again for participating in the thread.
blog comments powered by Disqus