After almost one year of beta testing VMware finally release first version of its benchmarking platform: VMmark.
Measuring performances in a virtualization platform is much more complex than doing the same on a traditional physical system because of many factors impacting results. One of them is workloads amount and arrangement, meaning how many virtual machines are running on the same host and which kind of applications they are serving.
For this reason VMware approach to benchmarking uses concept of tiles, groups of six virtual machines each running a typical business application, and overall score:
The unit of work for a benchmark of virtualized consolidation environments can be naturally defined as a collection of virtual machines executing a set of diverse workloads. The VMmark Benchmark refers to this unit of work as a tile. The total number of tiles that a physical system can accommodate gives a coarse-grain measure of that system’s consolidation capacity. This concept is similar to some server benchmarks, such as TPC-C, which scale the workload in a step-wise fashion to increase the system load.
Tiles are relatively heavyweight objects that cannot by themselves capture small variations in system performance. To address this, both the number of tiles and the performance of each individual workload determine the overall benchmark score.
In VMmark also metrics are collected in an uncommon way, considering client-server transactions more than raw numbers achieved in local system interaction between software and hardware:
These metrics are collected at frequent intervals during the course of the run. The standard VMmark workload is designed to run for 3 hours with workload metrics reported every 60 seconds. This means that rather than having a single number upon completion of a test run, the user will have a series of numbers for each of the workloads. However, each workload score is defined as a single number: the average of a consecutive subset of the series of datapoints for that workload.
The steady state for the benchmark is defined as the middle two hours of the three-hour run. The first and last half hours are the ramp-up and ramp-down times, respectively. The steady state is further divided into three 40-minute sections. For each of the 40-minute sections we compute the result for the tile and select the median score of the three as the score for the tile.
The resulting per-tile scores are then summed to create the final metric. Normalization allows the integration of the different component metrics into an overall score.
This approach implies benchmarkers have to setup a complex client-server infrastructure, mimicking a production environment.
As in the past VMware doesn’t allow publishing of VMmark-based analysis without a formal approval:
VMware encourages all VMmark benchmarkers to formally submit a full disclosure report of their VMmark results and supporting documentation to VMware for review and publication on the VMmark website. VMware has established the review process described below to insure that all run and reporting rules have been followed and that the disclosure includes the details needed to reproduce the result.
Any VMmark result that the benchmarker wishes to formally submit for publication on the VMmark website should provide the full disclosure report and the supporting documentation to VMware a minimum of ten (10) business days prior to the planned publication date.
…
the VMmark test was not run in full compliance with these rules, VMware will not accept that result for publication and the benchmarker must not use that result in any public disclosure.
On top of that VMware put severe restrictions on how VMmark can be used to compare other benchmarking approaches or other benchmarked platforms:
…
Comparisons of VMmark metrics and submetrics to any other benchmark metrics are not allowed. VMmark utilizes other benchmarking software as load generators and produces results which are not comparable to the original benchmarks’ metrics.
…
Competitive comparisons using VMmark results in academic or research papers are not allowed.
Download VMmark free of charge here.
It will be interesting to see how other vendors (XenSource, Virtual Iron and Microsoft mainly) will react to this methodoly which is not stardard and severely restricted.
VMmark is a remarkable work trying to address a growing need in the virtualization industry, but without a broad acceptance and against a solid technical opposition, results it will generate may be useless for customers.
This is the reason why VMware (along with SWsoft, IBM, Sun, and others) is also working with SPEC to develop a standard benchmarking framework, which will hopefully grant customers more unbiased results.