Are XenServer and Hyper-V live migration technologies unreliable?

Mike DiPetrillo, Specialist System Engineer of Industry Research and Competitive Analysis department at VMware, describes on his personal blog some of the technical issues when virtual machines are live migrated between Intel and AMD CPUs:

…Basically you’re running an OS on an Intel box and let’s say the processor supports the SSE3 instruction set and your app happens to use that instruction. Now you migrate that to an AMD box that doesn’t support SSE3 but the app is still using it and trying to use it. BAM! Your app and your OS will crash. This can happen with VMotion and Microsoft Quick Migration. Actually anyone that does live migration will get impacted by this. There are several “user mode” instructions like this that we can’t mask out at the virtualization layer…

But DiPetrillo goes further and states that the Microsoft and Citrix implementations of this technology, included in upcoming Hyper-V and the current XenServer, may be a serious risk for virtual machines:

…With the Xen based live migration and Microsoft Quick Migration they do not perform the check and so you can actually do the migration but your app and your OS may die as a result.

…that’s why we say you can’t migrate from Intel to AMD just yet and this is why anyone that says they can do it is lying to you or just don’t understand the technology. The later happens to be true with most of VMware’s competitors – especially the field sales…

Update: As easy to imagine both Microsoft and Citrix promptly answered DiPetrillo statements.

Answer from Ben Armstrong, Program Manager on Core Virtualization at Microsoft:

Here [In planned fail-over] the virtual machines are placed into a saved state on the source physical computer and are then restored on the target physical computer. Since there is state transferred here there are issues with processor compatibility. For this reason we state here that you should have compatible processors for all computers involved in a virtualization cluster.

So what happens if you try to configure a cluster with Intel / AMD processors?
Unfortunately we are the only server product / role that cares about the processor type beyond “x86 or x64” so Windows Server Fail-over Clustering will happily let you create such a configuration.
When you then try to perform a planned fail-over of a virtual machine it will be placed in saved state on the source physical computer, but when we try to restore it on the target physical computer we will detect that the processor is not compatible and will fail the request. The virtual machine can then be safely restored on a compatible system.

Answer from Simon Crosby, CTO of the Virtualization and Management Division at Citrix:

…Both in XenServer and in open source Xen, we require a match on CPU processor vendor, family and stepping before a migration can be performed.

In both cases Mike DiPetrillo verified the statements and corrected his original post accordingly:

…I’m not sure why my original setup did not check for CPU compatibility. I’m also not sure why the guts of Quick Migration and failover – HAVM.vbs found at the end of the instructions for setup – does not clearly show anywhere that a check occurs. Never-the-less, if you happen to try migrating between incompatible processor types then you get the following warning in the interface followed by a VM left in a suspended state and awaiting instructions on where to go and run…

…I was using an older version of XenSource and relying on some experience with open source implementations in RHEL5 and SLES10. The RHEL5 and SLES10 implementations still do not perform CPU checks or if they do they certainly don’t tell you about it or warn you when you do a migration. Thankfully my test VM was a simple RHEL5 guest and just migrated between the Intel and AMD systems I had without worries. No warning. No caution. Just migrate. My XenSource 4.1 install did produce a warning when adding the AMD host to the Intel cluster…