Server virtualization meets grid computing

GRIDtoday published an interesting interview with Kate Keahey, an Argonne National Laboratory scientist working on the Globus Toolkit and other aspects of Grid technology, about how server virtualization can serve distributed computing purposes:


Gt: Virtualization and distributed computing seem to permeate everything in IT today. Tell us about some of the ways virtualization is converging with distributing computing and how Grid technology fits in.

KEAHEY: I think of virtualization as a vehicle to realize the dream of Grid computing — obtaining on-demand computational resources from distributed sources in the same simple and intuitive way we get electricity today. Today, in order to run a job on the grid a user has to identify a set of platforms capable of running that job by virtue of having the right installation of operating system, libraries, tools, and the right configuration of environment variables, etc. In practice, this means that the choice of platforms will either be limited to a very narrow set, or the job first has to be made compatible with an environment supported by a large resource provider, such as TeraGrid. For some applications this is a significant hurdle. Furthermore, even if you do manage to identify such an environment, it is hard to guarantee that the resource will be available when needed, for as long as needed, and that the user will gets his or her fair share of that resource.

Virtualization introduces a layer of abstraction that turns the question around from “let’s see what resources are available and figure out if we can adapt our problem to use them” to “here is an environment I need to solve my problem — I want to have it deployed on the grid as described.” For a user this is a much simpler question. The issue is whether we can implement the middleware that will map such virtual workspace onto physical resources. One way to implement it would be to provide an automated environment installation on a remote node.

But what really gives this idea a boost is using virtual machine technology to represent such a workspace. This makes the environment easy to describe (you just install it), easy to transport, fast to deploy and, thanks to recent research, very efficient. Best of all, virtual machine management tools nowadays allow you to enforce the resource quantum assigned to a specific virtual machine very accurately — so you could for example test or demo your application in a virtual cluster making sparing use of resources, and redeploy the virtual cluster on a much more powerful resource for production runs. This is another powerful idea behind virtualization: the environment is no longer permanently tied to a specific amount of resource but rather this resource quantum can be adjusted on-demand.

Similarly, we can define virtual storage and implemented using distributed storage facilities, or overlay networks implemented on top of networking infrastructure. We can compose those constructs to put together whole “virtual grids” and test their operation before requesting serious resource allocations. There are many exciting ongoing research efforts in this area and some of them will be represented at the VTDC workshop.

Further down the road, if the idea of running virtual machines becomes ubiquitous, we may find other ways of leveraging the fact that we can have more than one isolated “hardware device” on a physical resource. We could use it to host physical devices requiring isolation for security reasons. We could carry around pluggable virtualized environments the way we carry laptops today. We could rely on migration to a greater extent to provide uninterrupted services. All those potential applications will come more clearly in focus once we see how widespread the appeal of virtual machines will prove in practice….

Read the whole interview at source.

I already covered this topic in January 2006, with my old Virtualization is the first step of a long walk called Grid Computing.