R&D: Accelerating VMs live migration by 4-10x and beyond

Posted by virtualization.info Staff   |   Tuesday, April 14th, 2009   |  

Guest star author: Kevin Lawton, Lead developer of Bochs.

VMs live migration, VMotion in VMware parlance, is a key technology underlying a number of useful features. For example, VMware’s DRS and DPM features use migration to perform load balancing and power management respectively.  These are in essence high level forms of scheduling, though with much coarser granularity of time at which an operating system schedules.

Given a migration within the same storage and networking domains, there is still a considerable amount of VM memory which has to be transferred between source and destination servers, through a finite amount of networking bandwidth. On a 1GbE network for example, a VM with 2GB of RAM might have a best-case migration time on the order of 20 seconds. Or on a 10GbE network, the same VM might have a best-case migration time on the order of 2 seconds. In some cases, live migration takes minutes to complete.

Using relatively slow VM migration as a mechanism for scheduling, has a number of risks and short-comings, which leave its full potential untapped. 
This is necessarily true, because the time within which workloads can ebb and flow (and spike) is much quicker than the response time available to the scheduler to re-schedule VMs on other servers. As a result, the scheduler has to be ultra-conservative, otherwise it may break SLAs and/or create troublesome load-based hot-spots.  By contrast, if the scheduler could expect near instantaneous VM migrations, it could perform much higher fidelity load-balancing or much more efficient power management (packing VMs onto absolutely the fewest number of powered-on servers).  Thus, as live VM migration times decrease, the less conservatism is needed, and the greater the amount of potential performance and power savings can be wrung out of existing resources.

Accelerating VM migration
One of the keys to accelerating VM migration time is to make use of duplicate memory throughout the compute fabric.

As is the case within a given physical server, there is generally a considerable amount of duplicate memory contents.  The more similar the VMs, the higher the percentage of duplication.  But rather than look for intra-server memory duplication, I researched looking at memory contents across a whole fabric (cloud) of servers. 
Each server becomes a member of a collective, which act in concert to identify and properly mark memory duplication through the fabric. Of course, this requires some new infrastructure and a much broader set of techniques to handle memory duplication analysis, but it can potentially plug into existing virtualization hypervisors.  A strong benefit is that by extending the amount of memory to analyze, to the entire chosen universe of servers, much more duplication can be found, even if any given physical server hosts a diverse set of VMs at any one time.  By contrast, intra-server content sharing is limited in scope to only the memory contents of the current VM workloads (less opportunity, and more VM diversity constrained).

With a distributed memory duplication network in-place, some really tangible benefits result. 
The first is that memory which is recognized to exist on both source and destination of a migration, does not need to be transferred. It is simply copied from the existing memory contents on the destination (or referred to by a pointer). This eliminates both the networking transfer time associated with duplicate state, and the bandwidth it would have consumed. 
So for example, if one could find 75% redundancy, only 25% of the memory contents plus some meta information about the elided contents need be transferred. 

Now, this turns out to be a leveraged optimization due to the way live migration works.Generally a pre-copy of memory state is sent in a 1st pass, while the VM continues execution. Subsequent passes transfer only deltas since the last pass, until a threshold (small enough amount of data) is reached, in which case the VM is stunned and the remaining deltas are transferred. 
Any reduction in data transferred on the 1st pass, also is handsomely rewarded by narrowing the window of time (and thus the amount of memory deltas) for the 2nd pass!  A smaller 2nd pass requires a smaller 3rd pass, and so on. 
To make this more concrete, this VMware talk (slide 11) shows a nominal progression of passes for a 2GB VM (over a 1GbE network), of 16s, 4s, 1s, 0.25s. With the memory duplication network, my research shows this can be nominally reduced to 4s, 1.0s, 0.25s.  Or about 4x faster than current technology.

Pushing the envelope
Optimizing by 4x would accelerate the ~20s migration case above to ~5s (1GbE) and the ~2s case to ~0.5s (10GbE)! 

That alone is enough to make the time granularities more attractive for far more aggressive load and power management scheduling. But there are ways to optimize even further, with various trade-offs of power/compute/memory. 

First, a number of techniques have been researched for optimizing memory sharing, such as using sub-page granularity (up to 65%) and a differencing engine (up to a phenomenal 90%). But a second technique can be used independently or in conjunction with other techniques. 
Given a distributed memory sharing network, even non-duplicate memory can be transferred (replicated) to other nodes in the network on speculation. As this replication is pure speculation, corresponding memory contents can be dropped at-will and used for more immediate needs.  So it can use “spare” memory in the network of servers. 
Of course, high rate of change memory areas are not necessarily good candidates for such replication.  But, I found in my research that it’s not hard to get to 80% .. 90% effective sharing using this technique, and thus VM migration acceleration can reach 5x .. 10x before employing any exotic memory sharing techniques.

Now another benefit of this technology is more evident when doing longer distance VM migrations, over a more constrained networking pipe. 
This use case is still evolving, as is evidenced by VMware’s keynote talk at VMworld. But let’s consider a VM migration between two distant data centers.  Obviously, it’s good to saturate finite networking resources with less VM memory data, using the elision techniques. 
What’s also interesting is that we don’t necessarily have to have the duplicate data between source and destination servers.  We just need to have the duplicated data somewhere in the destination data center, preferably near the destination. 
In that case, we can transfer the duplicated data in a short-haul fashion intra-data-center, and the unique data long-haul inter-data-center. And we can do it in parallel! 
Given a scalable memory sharing network infrastructure, the bigger the cloud of virtualization, the more of these kinds of optimization opportunities exist.  And the more spare memory is available to speculate with, for unique memory replication.

Power
The 1st VM executed on a hypervisor costs the most money in terms of power, as there is a lot of overhead in power up a server and the related chips.  Adding more VMs (and thus requiring more MHz and power) costs incrementally less, as the initial power-on costs have already been paid.  Allowing finer grained scheduling has the advantage of allowing a greater average load (and VM density) on a physical server, and thus allows both a better utilization and power efficiency of reso

urces.  I estimated from a rough model, that utilization could be increased another 15+% if more rapid scheduling is used.  That can translate to 15+% less hardware, and/or some non-trivial power savings.

Scheduling
I believe scheduling for load balancing and power management will continue evolving for some time.  What’s nice about the techniques I describe herein, is that a much richer amount of knowledge exists which can be input into scheduling decisions.  As the sharing potential between any two VMs can be divined, inter-server scheduling can for example, decide to place more similar VMs on the same physical server to get better density.  And as has been the case, a server often runs out of RAM before it runs out of compute capacity.  So better VM density is important for obtaining peak utilization.

Growing with the Cloud
This is a technology with a very immediate benefit to the existing use-case of virtualization, where a lot of VMs tend not to live-migrate outside of a particular physical location.   But it scales nicely across multiple data center locations, and will grow with virtualization as it spans multiple physical locations, as per the VMworld keynote.  In fact in the latter case, it will be absolutely critical to optimize VM migration times, and reduce the amount of network bandwidth consumed. And I believe the technology will couple nicely with future storage and networking continuity solutions.

 

About the author

Kevin Lawton is a pioneer in x86 virtualization, serial entrepreneur, business and technology visionary, prolific idea creator, news and business book junkie. Founding team member in a microprocessor startup, the author and lead for two Open Source projects, and a public speaker. He has a degree in computer science and started his career at MIT Lincoln Laboratory.

Contact him here. Note that the research is patent pending.



blog comments powered by Disqus


virtualization.info Newest articles
Release: VMware vRealize Log Insight 4.5

June 13th, 2017

Log Insight is a log aggregation, management and analysis tool, that VMware first introduced in 2013 and considered a competitor of Splunk.
Yesterday VMware announced the release of version 4.5, available for…

Release: VMware vRealize Automation 7.3

June 6th, 2017

Today VMware announced the latest release of its cloud management platform vRealize Automation, former vCloud Automation Center.
VMware vRealize Automation 7.3 release notes can be found at this link.

The…

Paper: Introducing the NSX-T Platform

February 9th, 2017

“We see greater potential strategic opportunity in NSX over the next decade than our franchise product vSphere has had for the past decade.”
said VMware’s CEO Pat Gelsinger talking about…

Paper: VMware vSphere Virtual Machine Encryption Performance

November 22nd, 2016

Encryption of virtual machines is something that has been requested for years by the security community. VMware continued to postpone its implementation due to the negative operational impact that many…

Quest Software leaves Dell

November 1st, 2016

In September 2012 Dell announced to have completed the acquisition of Quest Software, a Californian company with an history in systems management, security, business intelligence and, falling back in our…

Citrix announces Q3 2016 results

October 21st, 2016

Citrix announced its financial results for third quarter 2016.
The revenues for the second quarter were $841 million for an increase of 3% compared to Q3 2015.
Net income was $132…

Release: VMware vSphere 6.5 & Virtual SAN 6.5

October 19th, 2016

2016 edition of VMworld US has been quite turbulent, on the other hand during VMworld Europe, happening these days in Barcelona, the company announced a few more products for the…

Release: VMware vRealize Log Insight 4.0

October 18th, 2016

Log Insight is a log aggregation, management and analisys tool, that VMware first introduced in 2013 and now is usually compared with Splunk.
Yesterday VMware announced Log Insight’s new major…

Release: Windows Server 2016 with support for Window Server & Hyper-V containers

October 13th, 2016

Yesterday Microsoft announced the general availability of Windows Server 2016 which the company defines as a cloud-ready OS.
Beside fancy definitions, one of the most relevant perks of this release…

Release: Oracle VM 3.4.2

September 22nd, 2016

During Oracle OpenWorld 2016 the company released version 3.4.2 of its enterprise virtualization solution.
Oracle VM is available for both x86 and SPARC based processor architectures and uses the Xen hypervisor…

VMworld US 2016 Wrap-up

September 1st, 2016

Today was the last day of VMware’s flagship conference VMworld in Las Vegas, an highly controversial edition which left a good chunk of the audience disoriented if not properly disappointed….

Gartner releases its Magic Quadrant for Cloud Infrastructure as a Service for 2016

August 11th, 2016

Last week Gartner updated its Magic Quadrant for Cloud Infrastructure as a Service (IaaS) for the year 2016. The Magic Quadrant for the year 2015 was released in May last year…

Release: Ansible Tower 3 by Red Hat

August 2nd, 2016

Ansible is one of the four main players in the automation market, younger then the well known Chef and Puppet, has been launched in 2013 in Durham, N.C. and acquired…

IBM announces earnings for Q2 2016

July 19th, 2016

Yesterday IBM announced its results for Q2 2016.

If we compare with the same quarter in 2015 earnings per share, from continuing operations, decreased 22%. Net income, from continuing operations,…

 
Monthly Archive