Now and Xen

Quoting from Linux Magazine:


How would you like to run several operating systems at once on the same physical hardware with virtually no performance overhead — and for free? That’s the promise and the purpose of Xen, a relatively new open source project that turns one piece of hardware into many, virtually. If you’re looking to cut costs or maximize usage or both, follow the path to Xen.

Hardware virtualization allows multiple operating systems to run simultaneously on the same hardware. With such a system, many servers can run on the same physical host, providing more cost-effective use of valuables resources, including CPU, power, and space. Additionally, separate instances of one or more operating systems can be isolated from each other, providing an additional degree of security and easier management of system-wide resources like configuration files and library versions.

Up until now, there have been no open source solutions for efficient, low-level virtualization of operating systems. But now there’s Xen, a virtual machine manager (VMM) developed at the University of Cambridge.

Xen uses a technique called paravirtualization, where the operating system that is to be virtualized is modified, but the applications run unmodified. Paravirtualization achieves unparalleled performance, while still supporting existing application binaries.

At the moment, Xen supports a slightly modified Linux 2.4 kernel and NetBSD, with full support of OpenBSD coming in a few months. Xen even supports an experimental version of Windows XP (however, XP cannot be distributed, except to those who’ve signed Microsoft’s academic license), and ports of Linux 2.6 and Plan 9 are in development.

Xen 1.0 has been publicly available for just over a year, and Xen 2.0 will be released shortly after you read this. This article discusses the benefits of hardware virtualization, explains why Xen was built in the first place, and previews some of the exciting, new features available in 2.0.

What is Xen?

Think of Xen as a next generation BIOS: Xen is a minimally invasive manager that shares physical resources (such as memory, CPU, network, and disks) among a set of operating systems. Effectively, Xen is transparent, as each operating system believes it has full control of its own physical machine. In fact, each operating system can be managed completely independent of one another.

Moreover, Xen divides resources very rigidly: it’s impossible for a misbehaving guest (an operating system that runs on a Xen host) to deny service to other guests. Simultaneous yet discrete operation is incredibly valuable.

For example, consider the problems inherent with hosting a set of services for different user groups. Perhaps you’re an application service provider, selling rack mount web server accounts. Or, perhaps you want to install a set of dissimilar services on the same physical host, but want avoid the overhead of trying to get system-wide configuration files to play nicely with all of them. Xen allows the installation of many operating system instances on the same host.

Xen is also useful in factoring servers for enterprise administration. The database administrator and web administrator may have entirely separate OS configurations, root shells, and so on, while sharing common physical hardware.

Virtualization has applications for home users, too. For example, consider the benefit of application sandboxing: applications that are at risk for attack by worms or viruses (think web browsers and email clients) can be run within a completely separate virtual machine. If, for whatever reason, one sandbox becomes infected, it can simply be destroyed and recreated, leaving the rest of the system untouched. The same applies for downloading applications off of the Internet that you don’t necessarily trust, like games or file sharing tools — just run them in a separate, isolated, OS instance.

Unlike User Mode Linux (UML, see http://www.linux-mag.com/2004-01/uml_01.html) and Bochs (see http://www.linux-mag.com/2003-10/guru_01.html), Xen provides excellent performance. Unlike virtual servers, Xen provides real low-level resource isolation, preventing individual operating system instances from interfering with the performance of others. And unlike commercial virtualization packages, Xen is free.

Paravirtualization

Many other existing packages for virtualization do what’s often referred to as pure virtualization. In pure virtualization, the virtualization layer presents an exact replica of the underlying physical hardware to the operating systems that run above it. Many CPUs make such a form of virtualization very easy, in some cases even providing specific support for it.

One big benefit of pure virtualization is that the operating system software need not be modified to run, because it sees the illusion of raw hardware. Unfortunately, x86 processors do not provide specific support for virtualization. More specifically, they don’t virtualize very well at all. (To understand why pure virtualization is so inefficient, see the sidebar “Why Pure Virtualization is Bad.”)

———————————————–
Sidebar: Why Pure Virtualization is Bad

Xen’s approach to virtualization is called paravirtualization: the interfaces presented to the operating system are not those of the raw physical devices. While paravirtualization enhances performance, it comes at a cost: operating system code must be modified before it can run on Xen. In essence, Xen is a new architecture, slightly different from x86, that operating systems must be ported to.

There are three crucial problems with purely virtualizing the x86 architecture, and all are very difficult to address, as solutions are bound to introduce a severe performance overhead.

– PAGE TABLES
Memory management is quite tricky to virtualize effectively. The virtual machine manager often provides the guest with a shadow page table, which appears to be a set of physically contiguous memory, and then remaps all accesses to this page table behind the scenes (at considerable cost).

Xen’s approach is to let the OS know what pages of memory it really has (machine addresses) and then allow a mapping onto a contiguous range (pseudo-physical addresses). This means that the OS can have raw access to its page table, with Xen being involved only to validate updates for safety (specifically, to prevent one OS from attempting to map memory that doesn’t belong to it.).

– PRIVILEGED INSTRUCTIONS
Certain instructions on x86 (pushf, for instance) only result in a trap when run in supervisor mode (CPU ring zero, where the operating system normally lives). However, when virtualized, the operating systems no longer runs at the appropriate level, and these instructions no longer result in traps.

In full virtualization this is commonly addressed with a technique called code scanning: the virtual machine manager examines the executing binary and redirects these calls. But since this run-time scanning can be very expensive, Xen does it beforehand. One of the tasks involved in porting an OS to Xen is to replace privileged instructions with the appropriate calls.

– I/O DEVICES
Sharing I/O devices such as network cards with pure virtualization means that the device driver in the guest OS must be able to interact with what it thinks is the raw physical device.

Rather than providing support for a virtualized version of every possible peripheral device, one approach is to map all underlying devices to the illusion of a single common one. This means that as long as the operating system running on top has support for that device, it will run without problems. Unfortunately, this also means that the system ends up running two device drivers for each device. In the case of network interfaces, extra device drivers typically mean extra copies, and so result in a per-byte overhead on each packet sent and received.

The paravirtualization approach to this problem is to provide the guest with a single idealized driver for each class of device. In the case of network interfaces, the guest OS driver interacts with a pair of buffers that allow messages to be sent and received without incurring an extra copy as they pass to Xen.

Solving these problems for pure virtualization is hard work, and several other software projects have made heroic efforts to reduce the associated performance costs.

In designing Xen, the software’s development team came to the conclusion that these just weren’t the right problems to solve. Paravirtualization seems to work quite a bit better, despite the one-time effort of porting an OS. And that cost is actually slight: the original port of Linux to run on Xen involved changing or adding about three thousand lines of source code, representing about 1.5 percent of the total Linux source. Moreover, about half of the changes are in the code for the new device drivers.
———————————————–

With Xen, most of the changes required to paravirtualize an existing OS are in the architecture-specific part of the operating system code. (The Linux 2.6 for Xen effort aims to further isolate the code in hopes that Xen will be included as a separate architecture within the 2.7 kernels.)

The paravirtualization of device drivers (described in the “Why Pure Virtualization is Bad” sidebar) adds another benefit: device drivers only need to be implemented once for all operating systems. Any guest can use any driver that’s supported by Xen.

Xen’s Latest Tricks

The initial release of Xen focused largely on making virtualization work and providing hard performance isolation between guest operating systems. In the year since getting isolation to work, many new features have been added that really demonstrate the benefits of virtualization.

– IMPROVED RELIABILITY
Because Xen strictly isolates operating system instances, system reliability is enhanced.

Device drivers are commonly seen as a major source of instability. As drivers run in the kernel, driver bugs have a tendency to run amok, corrupting system memory and causing crashes.

In the original release of Xen, device drivers ran within Xen itself, exporting a common interface to all guests regardless of the specific device they were using. This simplified device support in the guest, but was ultimately a bad decision, because a driver crash could potentially crash Xen itself, just like in a non-virtualized OS.

In Xen 2.0, the Xen developers attacked this problem head-on, moving drivers up into their own guest OS domains. Drivers now run in an isolated virtual machine in the same way that a guest operating systems does, yet drivers remain shared between guests as before. When a new domain is configured, the administrator chooses its hardware. Examining a hardware bus from within a guest only reveals the devices that have been exported to it.

The performance of placing device drivers in a completely separate OS instance is surprisingly good. Xen 2.0 includes specific mechanisms for the page-flipping that was used to transfer network data in the original release of Xen. Guests can share and exchange pages at very low overhead, and Xen carefully tracks page ownership to ensure stability in the case of a crashing or misbehaving guest.

The additional cost to consider is context switch times, because now both the driver and the guest must be scheduled before an inbound packet or disk block is received. Fortunately, due to the bulk nature of both of these types of devices, drivers are largely able to batch requests, resulting in minimal performance degradation.

Xen can still allow raw device access to guests that need it by making the hardware visible to a guest. This is suitable for devices that are generally used by a single domain, such as video and sound, with one caveat: allowing device DMA access to guests is very dangerous. On the x86, DMA has unchecked access to physical memory, and so an erroneous (or malicious) target address can result in the overwriting of arbitrary system memory. Hopefully, newer I/O MMU support in emerging hardware can help address this particular issue, as it’s a major problem in existing systems.

In the common case though, where raw device access isn’t needed, driver isolation adds plenty to reliability. As an added benefit, driver crashes may be corrected in a running system. A privileged guest in Xen can be configured to monitor the health of each driver. Should the driver become unresponsive, crash, or attempt to consume excessive resources, it can be killed and restarted. Fault-injection experiments have shown that restarts are very fast, on the order of a hundred microseconds. A network card can crash and be restarted almost unnoticed as a transfer is in progress.

Finally, there are commonly large differences between drivers for the same device on different operating systems. A Linux driver may expose hardware features that are missing from its Windows counterpart, or a Windows driver may exist where Linux simply isn’t supported. Such disparities are largely due to organization: driver support for a specific platform needs an interested community of users to demand it, and considerable OS expertise to develop it.

Virtualization puts an interesting twist on the age-old problem of driver support. Hardware drivers can be written once, using whatever OS they choose. Xen’s current, sample drivers are Linux drivers running on a cut-down Linux kernel. With those in place, all that’s left to do is write the idealized drivers for each guest OS to interface with the top of the hardware driver.

– SUSPEND AND RESUME
Encapsulating application and OS state within a managed virtual machine allows for a range of exciting system services. One of the most useful of these is the ability to suspend a virtual machine and resume it at another time or in another place.

For example, a complex application can be configured in isolation from the rest of the system and within its own OS instance, and can then be “canned” so that a fresh copy of the application can be quickly instantiated whenever necessary.

Suspending a VM requires Xen to store its configuration and execution context to a file. Configuration details include parameters such as CPU allocation, network connections, and disk-access privileges, while execution context contains memory pages and CPU and register states.

Although resuming a virtual machine is largely a matter of reinstating its configuration and reloading its execution context, it’s somewhat complicated by the fact that the newly-resumed virtual machine will be allocated a different set of physical memory pages. Since Xen doesn’t provide full memory virtualization, each guest OS is aware of the physical address of each page that it owns. Resuming a virtual machine therefore requires Xen to rewrite the page tables of each process, and rewrite any other OS data structures that happen to contain physical addresses. This task is relatively simple for XenLinux, as most parts of the OS use a pseudo-physical memory layout, which is translated to real physical addresses only for page-table accesses.

– LIVE MIGRATION
Virtual machine migration can be thought of as a special form of suspend/resume, in which the state file is immediately transferred and resumed on a different target machine. Migration is particularly attractive in the data center, where it allows the current workload to be balanced dynamically across available rack space.

However, although Xen’s suspend/resume mechanism is very efficient, it may not be suitable for migrating latency-sensitive or high-availability applications. This is because the virtual machine cannot resume execution until its state file has been transferred to the target system, and this delay is largely determined by its memory size: a complex VM with a large memory allocation takes a correspondingly long time to transfer.

To avoid prolonged downtimes, Xen provides a migration engine that transfers a VM’s configuration information and memory image while the VM is still executing. The goal of the migration engine is to stop execution of the VM only while its (relatively tiny) register state is transferred. The “fly in the ointment” is that this can lead to an inconsistent memory image at the target machine if the VM modifies a memory location after it’s been copied. Xen avoids these inconsistencies by detecting when a memory page is updated after it is copied, and retransferring that page.

To do this without requiring OS modifications, Xen installs a shadow page table beneath the VM. In this mode of operation, the guest’s page table is no longer registered with the MMU. Instead, regions of the guest page table are translated and copied into the shadow table on demand.

Shadow page tables are not new: they are used in fully-virtualizing machine monitors such as VMware’s products to translate between a guest’s view of a contiguous physical memory space and the reality that its memory pages are scattered across the real physical memory space.

Shadow page tables are not used by the migration engine for full translation, but for dirty logging. The page mappings in the shadow table are therefore identical to those in the guest table, except for pages that the migration engine has transferred to the target system. Transferred pages are always converted to read-only access when their mappings are copied into the shadow table, and any attempt to update such a page causes a page fault. When Xen observes a write-access fault on a transferred page, it marks the page as “dirtied,” which informs the migration engine to schedule another transfer. Writable mappings of the page are again permitted until the page is retransferred (and again marked read-only).

Future Work

Xen is still in active development. In fact, by the time you read this article, there will likely be many new features available. Here is just a small sample of what you can look forward to:

– FINE-GRAINED RESOURCE ACCOUNTING
One of the next releases of Xen will provide a real-time account of all the resources used by each active OS. This allows each guest OS to be charged for resources consumed and can also be used to establish consumption limits.

– XENOSERVERS
Xenoservers is a project to globally distribute a set of Xen-based hosts. The intent is to deploy Xen on a broad set of hosts across the Internet as a platform for global service deployment. (More information is available on the XenoServers web site at http://www.xenoserver.org)

Getting Xen

Xen and XenoLinux are available as a single ISO image that can be downloaded and burned to CD. The CD is bootable, so you can bring up a demo without modifying your system simply by booting off the Xen CD.

The ISO image is available from Sourceforge and via BitTorrent. See the Xen download page at http://www.cl.cam.ac.uk/Research/SRG/netos/xen/downloads.html for links.

The Xen development team continues to develop new features for Xen and is always looking for enthusiastic people to join the project. If you’d like to participate, drop us a line!