At the end of our long journey in virtualization world, we finally arrived at the last stage of enterprise management.
After evaluating security solutions which best fits our needs in last instalment of this series, we now just miss to consider performance measurement and reporting issues before having a really good understanding of all aspects in a virtualization adoption project.
Actually tracking virtual machines performances, to pinpoint problems or just to have meaningful reports of resources consumption, is one of the most complex task for virtualization professionals. And not only because virtual machines behaviour is strictly related to the underlying host, but also because it heavily depends on what other virtual machines are doing.
And like with other challenges faced so far we will see market is currently offering few products really able to address them.
Virtualization needs new metrics
First of all we must understand traditional ways to measure performances in a datacenter don’t successfully apply to virtual infrastructures.
Depending on perspectives virtual servers are pretty identical to physical servers or completely different.
Looking from inside, virtual machines offers all traditional counters a performance monitor may need and usually tracks. So that existing reporting products are good enough if you simply install their agents in every guest operating system.
But in a virtual world some of obtained numbers are much less valuable, while others are simply meaningless.
A typical example is memory consumption and memory paging in a VMware ESX Server environment.
VMware’s flagship product has a special feature called ballooning.
Thanks to ballooning ESX can temporary use for other purposes some memory which system administrator assigned to a virtual machine: in any moment a special driver included in VMware Tools can request memory to the guest OS , just like inflating a balloon, which is freed away and immediately reallocated to other VMs in need of it.
While this happens the operating system is obliged to page out, showing unexpected, slight performances degradations.
When everything is back to normal, ESX deflates the balloon and give memory back to its original machine.
In the above scenario we have a guest OS reporting an incorrect memory and page file usage, which may lead to completely wrong deduction about how a virtual machine is performing.
Going further we could easily recognize how some other measurements have sense only related to what’s happening on the host.
In a scenario where a virtual machine is frequently reporting too high CPU usage, we couldn’t conclude it’s time for a virtual hardware upgrade, place a second virtual CPU, and feel confident about an improvement.
Sometimes a too high vCPU usage means the virtual machine is not served fast enough at host level, which may required a fine tuning of hypervisor’s resources management or upgrading number of physical CPU. And this can be discovered only tracking specific values at host level.
So we need to change our measuring approach, but what exactly do we need to track?
In a highly dense virtual datacenter, with tents of virtual machines in a single host, we mandatory need to consider interdependencies, and track the whole system as a single entity rather than a sum of elements.
And since relationship between virtual machines and hosts becomes critical, reporting solutions have to handle liquidity of every virtual datacenter, seamlessly adapting to hot or cold guest operating systems migrations within the infrastructure.
Last but not least these products have to address scalability: when administrators have to consider performances of thousand of virtual machines deployed on hundreds of hosts reporting solutions must work in fully automated mode, and provide smart summaries which are still human readable and meaningful.
Populating an almost empty segment
Performance tracking and reporting solutions segment is one of the emptiest in today’s virtualization industry.
In part because of complexity, in part because of still short demand, and in part because of little awareness that traditional solutions are quickly becoming inadequate.
Obviously virtualization platforms’ vendors offer more or less enhanced reporting tools, but at the moment none of them is addressing customers needs with a serious, dedicated solution.
So at the moment we have to look for 3rd party virtualization ISVs, which are providing only few products addressing a limited market.
Among current players we surely can mention vizioncore, which focus exclusively on VMware with its esxRanger.
This product provides a wide amount of charts, tracking history of virtual machine and hosts performances, and it’s a very good entry-level product.
vizioncore also offers a free edition which grants low-budged departments a decent capability to understand what’s happening in their infrastructure.
Devil Mountain Software (DMS) tries to embrace a much wider audience with its Clarity Suite 2006, supporting hardware virtualization solutions (VMware, Microsoft, but only Windows-based virtual machines) as well as application virtualization ones (Softricity, Altiris).
Clarity Suite is a hosted solution more focused on virtualized workloads profiling, comparing performances with a scoring system.
The solution does some simple correlations between virtual machines and hosts metrics, useful for capacity planning and what-if scenarios, but it’s still far from being the most complete reporting system for virtualized environments.
Like vizioncore also DMS offers a free version of Clarima Suite, which is unfortunately very limited in amount of deployable agents and in features.
A last company worthwhile of mention is the new entry Netuitive, which focuses on VMware ESX Server only as vizioncore, but offers innovative features: the SI solution automatically profiles virtual machines and hosts performances creating behaviour profiles, which correlates and uses to recognize odd behaviours.
As soon as they appear Netuitive SI reacts asking the VMware infrastructure a reconfiguration of its resource pools, so that performances bottlenecks are immediately addressed, much before any human intervention.
Is reporting the first aspect where datacenter automation will begin?
This article originally appeared on SearchServerVirtualization.