The VMware Performance Team posted on its corporate blog some results obtained comparing ESX Server performances in NUMA and non-NUMA architectures:
recently performed some NUMA characterizations using VMmark on an older HP DL585 with 4 2.2 GHz dual-core Opterons. In the DL585, each dual-core processor is in its own NUMA node. I wanted to measure how heavily we stress the NUMA interconnect links, known as HyperTransport (HT) on the Opteron. I ran tests with one VMmark tile (6 VMs), two VMmark tiles (12 VMs), three VMmark tiles (18VMs), and four VMmark tiles (24 VMs). The tests consumed 27%, 58%, 90%, and 100% of the system CPU resources, respectively.
…
The most important result is that the HT utilization remains below 20% in all cases. This implies that we have a large amount of headroom in the memory subsystem, which can be used as processor speeds increase. More importantly, the transition to quad-core systems should also be smooth, especially since newer versions of the HT links should provide even better performance.
…
I then repeated the experiment with the DL 585 configured in memory-interleave (non-NUMA) mode in order to quantify the benefits of using NUMA on this system…The tests also consumed slightly more CPU resources than the NUMA configuration at each load level due to the higher average memory latencies caused by the high proportion of remote accesses. The average CPU utilization was 30%, 62%, 95%, and 100% with 6 VMs, 12 VMs, 18 VMs, and 24 VMs, respectively…
Read the whole article at source.