Benchmarks: ESX vs Hyper-V vs XenServer

It doesn’t matter how hard you look, it’s almost impossible that you are going to find a performance comparison that involves Citrix XenServer, Microsoft Hyper-V and VMware ESX.
The VMware End User License Agreement (EULA) specifically says that the company won’t recognize any 3rd party performance testing before it has the chance to review and approve the adopted methodology.

(before June 2006 the situation was even worse as VMware simply didn’t allow the publishing of any benchmark comparison)

At these conditions, the chances that you’ll see an independent benchmark where VMware is outperformed by its competitors are zero.

Despite that, last week a group of brave reporters at Virtualization Review challenged VMware and published an independent analysis without asking any permission.

To ensure the validity of our test results and testing environment, Virtualization Review enlisted the help of Stuart Yarost to formulate and validate the test plan. Yarost is an ASQ Certified Software Quality Engineer and Certified Quality Engineer with more than 22 years’ experience in the software and quality fields. Yarost currently holds the position of Vice Chair of Programs for the ASQ Software Division.

The results are more than interesting:

  • In our tests, Hyper-V did well in all categories-it’s a real, viable competitor for the competition.
  • XenServer’s test results are impressive, but are they enough to justify a replacement of your current hypervisor? For environments with virtualized systems that have a large number of CPU- and memory-intensive workloads, it may be a good choice. The caution is that those high I/O workloads flirt with not being good virtualization candidates, so some administrators might instinctively place these workloads on physical systems. Make no mistake, however: XenServer did extremely well, posting excellent performance numbers.
  • For the first two tests of heavy workloads, VMware underperformed both XenServer and Hyper-V. For the lighter workloads on the third test, the results were almost indistinguishable across the platforms, but ESX had the best results in three of the four categories.

Easy to guess, VMware is not happy and yesteday severely criticized Virtualization Review on the corporate blog Virtual Reality with the post: A big step backwards for virtualization benchmarking.

The list of objections is long:

    • The fact that ESX is completing so many more CPU, memory, and disk operations than Hyper-V obviously means that cycles were being used on those components as opposed to SQL Server.  Which is the right place for the hypervisor to schedule resources?  It’s not possible to tell from the scarce details in the results.
    • All resource-intensive SQL Servers in virtual and native environments have large pages enabled.  ESX supports this behavior but no other hypervisor does.  This test didn’t use that key application and OS feature.
    • The effects of data placement with respect to partition alignment were not planned for.  VMware has documented the impact of this oversight to be very significant in some cases.
    • The disk tests are based on Passmark’s load generation, which uses a test file in the guest operating system.  But the placement of that file, and its alignment with respect to the disk system, was not controlled in this test.
    • The SQL Server workload was custom built and has not been investigated, characterized, or understood by anyone in the industry. As a result, its sensitivity to memory, CPU, network and storage changes is totally unknown, and not documented by the author.  There are plenty of industry standard benchmarks to use with hypervisors and the days of ad hoc benchmark tests have passed.  Virtual machines are fully capable of running the common benchmarks that users know and understand like TPC, SPECweb and SPECjbb.  An even better test is VMmark, a well-rounded test of hypervisor performance that has been adopted by all major server vendors as the standard measurement of virtualization platforms or the related SPECvirt benchmark under development by SPEC.
    • With ESX’s highest recorded storage throughput already measured at over 100,000 IOPS on hundreds of disks, this experiment’s use of an undocumented, but presumably very small, number of spindles would obviously result in a storage system bottleneck. Yet storage performance results vary by tremendous amounts. Clearly there’s an inconsistency in the configuration.

VMware highlights how this analysis was not reviewed and approved, and that because of this kind of work they don’t remove the EULA restriction.
And to be absolutely sure that everybody know about the flaws of this benchmarks, this morning the company sent out an alert to its entire Channel.

How the other two vendors reacted?

Citrix didn’t comment so far, while Microsoft validated the study by linking it on the corporate blog.
Now if they want to defend the Hyper-V score in this benchmark is better they publish a counter-analysis explaining why VMware is wrong.