Guest star author: Kevin Lawton, Lead developer of Bochs.
VMs live migration, VMotion in VMware parlance, is a key technology underlying a number of useful features. For example, VMware’s DRS and DPM features use migration to perform load balancing and power management respectively. These are in essence high level forms of scheduling, though with much coarser granularity of time at which an operating system schedules.
Given a migration within the same storage and networking domains, there is still a considerable amount of VM memory which has to be transferred between source and destination servers, through a finite amount of networking bandwidth. On a 1GbE network for example, a VM with 2GB of RAM might have a best-case migration time on the order of 20 seconds. Or on a 10GbE network, the same VM might have a best-case migration time on the order of 2 seconds. In some cases, live migration takes minutes to complete.
Using relatively slow VM migration as a mechanism for scheduling, has a number of risks and short-comings, which leave its full potential untapped.
This is necessarily true, because the time within which workloads can ebb and flow (and spike) is much quicker than the response time available to the scheduler to re-schedule VMs on other servers. As a result, the scheduler has to be ultra-conservative, otherwise it may break SLAs and/or create troublesome load-based hot-spots. By contrast, if the scheduler could expect near instantaneous VM migrations, it could perform much higher fidelity load-balancing or much more efficient power management (packing VMs onto absolutely the fewest number of powered-on servers). Thus, as live VM migration times decrease, the less conservatism is needed, and the greater the amount of potential performance and power savings can be wrung out of existing resources.







