How Microsoft and VMware use virtualization internally

Posted by Staff   |   Thursday, April 9th, 2009   |  

microsoft logo

vmware logo

Who better than a virtualization vendor to show a successful case study to convince prospects to buy?

In May 2008 Microsoft published some details about how it’s using Hyper-V to serve MSDN and TechNet IIS7 web front-ends.

Nor VMware neither Citrix or other vendors ever published any information about their in-house implementations.
Anyway juicy additional details recently emerged about both the Microsoft and the VMware data centers.

How Microsoft is really using Hyper-V

The MSDN and TechNet case studies were interesting but lacked many details. A new document published in January 2009 on the TechNet library now tells a much clever (and in some cases concerning) story:

As early as September 2004, Microsoft IT calculated that the average CPU utilization for servers in data centers and managed lab environments was less than 10 percent, and continuing to decrease.

The virtualization goals are set very high for Microsoft IT, which has deployed more than 3,500 virtual machines. By June 2009, Microsoft IT plans to have 50 percent of all server instances running on virtual machines. With Windows Server 2008 Hyper-V, the expectation is that at least 80 percent of new server orders will be deployed as virtual machines.

As Microsoft IT developed standards for which physical machines to virtualize, it identified many lab and development servers with very low utilization and availability requirements. Because of the lower expectations, Microsoft IT now is deploying the lab and development virtual servers with four processor sockets, 16 to 24 processor cores, and up to 64 gigabytes (GB) of random access memory (RAM). These servers can host a large number of virtual machines, averaging 10.4 virtual machines per host machine.

As Microsoft IT developed its expertise in deploying virtual machines, and especially with the performance improvements available with Windows Server 2008 Hyper-V, it has increasingly moved toward virtualization of production servers. Although many production servers still have low utilization, some have significantly higher performance requirements than the lab and development computers. For the production-server deployments, Microsoft IT is using servers with two processor sockets, 8 to 12 processor cores, and 32 GB of RAM.

On average, the host servers with eight processors and 32 GB of RAM are hosting 5.7 virtual machines in the production environment.

Microsoft IT configures all virtual machine hosts to use a SAN to store the virtual machine configuration and hard disk files. The host computers connect to the SAN by using dual-path Fibre Channel host bus adapters (HBAs). For production virtual servers, the SAN storage uses redundant array of independent disks (RAID) 0+1, whereas RAID 5 is used for lab and development virtual machine storage. Microsoft IT has chosen the RAID 0+1 configuration for the production servers because it provides better performance, but it does consume more disks. Performance is not as critical in the lab environment, so Microsoft IT uses RAID5 because it uses fewer disks to store the virtual machines.

When Microsoft IT first deployed server virtualization, the goal was to use a shared storage model for the virtual machines. During the first iteration, Microsoft IT would create one or two large logical unit numbers (LUNs) on the SAN (100-plus GB) for each host computer and then deploy multiple virtual machines per LUN. In a typical scenario, Microsoft IT gave the customer a 50-GB drive C and a 20-GB drive D. Because both drives were dynamic virtual disks, the actual space used on the LUN was much less than the maximum size.

However, over time, the dynamic disks grew as the customers stored data on the virtual servers, and just two or three virtual machines could fill an entire LUN. This became a significant management issue for Microsoft IT, which had to track all LUNs for space availability and then move virtual machines before all space was utilized.

To address this issue and to enable failover clustering for the virtual machines, Microsoft IT next adopted a model of configuring just a single virtual machine per LUN. With this model, a LUN with 30 to 50 GB was dedicated to each virtual machine, with the option to give the virtual machines more space as required.

Microsoft IT has avoided using disk mount points, so the limiting factor for the number of virtual machines deployed on a host became the number of available drive letters on the host computers. In most cases, this meant not deploying more than 23 virtual machines on a host.

  • Microsoft IT has achieved 99.95 percent availability for virtual machines running on Microsoft Virtual Server 2005 R2, and it anticipates that the availability will increase for virtual machines running on Hyper-V. Very few applications that have been deployed as virtual machines require a higher availability level.
  • With Windows Server 2008 failover clustering, an administrator must store each virtual machine on an individual LUN. Because an administrator must provide all cluster nodes with access to the same shared storage by using the same drive letters, 23 is the maximum number of virtual machines that can run in a failover cluster. Microsoft IT could work around this limitation by using mount points and virtual machine groupings, but it considers this configuration too complex to administer. Because of this limitation, Microsoft IT has adopted a standard of using only three nodes in a cluster, with the cluster configured to tolerate one node’s failure.
  • When virtual machines fail over in a Windows Server 2008 failover cluster, the cluster service with Hyper-V must save the virtual machine state, transfer the control of the shared storage to another cluster node, and restart the virtual machine from the saved state. Although this process takes only a few seconds, the virtual machine still is offline for that brief period. If an administrator has to restart all hosts in the failover cluster because of a security update installation, the virtual machines in the cluster have to be taken offline more than once. Therefore, Microsoft IT determined that highly available virtual machines could have more downtime than virtual machines deployed on stand-alone servers in the case of simple planned downtimes for host maintenance, such as applying software updates.
  • Because of the required brief outage every time a virtual machine is moved from one host to another, Microsoft IT found that coordinating the server update processes with virtual machine owners was difficult. Because one physical host could contain several virtual machines, Microsoft IT had to communicate with each of the virtual machine owners and coordinate host server maintenance with virtual machine maintenance.

Because of these issues, Microsoft IT has not deployed failover clustering as the default standard for virtual machines. Microsoft
IT has deployed several three-node clusters and does provide this service for virtual machines running critical workloads. One of the places where Microsoft IT is using failover clustering for virtual machines is in some branch offices that do not have 24-hour support staff on site. In a data center where administrators always are available to react to host downtime, Microsoft IT has minimized the use of Hyper-V clustering…

The whole article is priceless and its read is highly recommended (thanks to Vinternals for the link).
For the lazy ones Microsoft even published a webcast about this internal case study. The presenter is David Lef, a Microsoft IT Technology Architect at Microsoft.

How VMware is really using ESX

As we said, despite its leadership, VMware never revealed how it’s using ESX and other products internally.
The fist time ever that the company disclosed details about its virtual infrastructure was in September 2008 at VMworld US. A refreshed presentation (DC35) was shown during the VMworld Europe 2009 by Tayloe Stansbury, the company CIO.

  • VMware has an internal VDI deployment of over 550 users, including members of most departments.
    The client configuration includes Wyse V10 Thin Clients, Dell 24” monitors (configured at 1920×1200 pixels, 15bit resolution), keyboard and mouse.
    The server configuration runs on HP c7000 blade systems, EMC Clariion CX3-80 storage and Cisco 3020s switch modules for the HP blades.
    The entire infrastructure is powered by VMware Virtual Desktop Manager (VDM) 2.1 for US and View 3.0 for Europe.
  • VMware has an internal virtualized mail server deployment serving 7800 mailboxes.
    The entire infrastructure is powered by 29 virtual machines (split in two data centers) running Microsoft Exchange 2007 Enterprise Edition. 22 of them are just for the mailboxes, the other 7 work as Client Access Servers (CAS).
  • VMware virtualizes its entire ERP infrastructure except Oracle Real Application Clusters (RAC). 
  • 97% of the company servers are virtualized across one Tier 4 and two Tier 2 data centers.
    Just two applications are missing (one is Oracle RAC).
    EMC DMX4 is used as the storage backend of choice for mission-critical applications, otherwise EMC CX3-80 is the choice.
    The front-end servers of choice are HP c7000 blades everywhere.
  • The average consolidation ration is 10:1 for server and 64:1 for VDI desktops
  • Each administrator manages an average of 145 virtual machines.

For the ones that cannot access the VMworld presentations (it requires a yearly subscription) VMware published a webcast about this internal case study.

blog comments powered by Disqus Newest articles
Release: VMware vRealize Log Insight 4.5

June 13th, 2017

Log Insight is a log aggregation, management and analysis tool, that VMware first introduced in 2013 and considered a competitor of Splunk.
Yesterday VMware announced the release of version 4.5, available for…

Release: VMware vRealize Automation 7.3

June 6th, 2017

Today VMware announced the latest release of its cloud management platform vRealize Automation, former vCloud Automation Center.
VMware vRealize Automation 7.3 release notes can be found at this link.


Paper: Introducing the NSX-T Platform

February 9th, 2017

“We see greater potential strategic opportunity in NSX over the next decade than our franchise product vSphere has had for the past decade.”
said VMware’s CEO Pat Gelsinger talking about…

Paper: VMware vSphere Virtual Machine Encryption Performance

November 22nd, 2016

Encryption of virtual machines is something that has been requested for years by the security community. VMware continued to postpone its implementation due to the negative operational impact that many…

Quest Software leaves Dell

November 1st, 2016

In September 2012 Dell announced to have completed the acquisition of Quest Software, a Californian company with an history in systems management, security, business intelligence and, falling back in our…

Citrix announces Q3 2016 results

October 21st, 2016

Citrix announced its financial results for third quarter 2016.
The revenues for the second quarter were $841 million for an increase of 3% compared to Q3 2015.
Net income was $132…

Release: VMware vSphere 6.5 & Virtual SAN 6.5

October 19th, 2016

2016 edition of VMworld US has been quite turbulent, on the other hand during VMworld Europe, happening these days in Barcelona, the company announced a few more products for the…

Release: VMware vRealize Log Insight 4.0

October 18th, 2016

Log Insight is a log aggregation, management and analisys tool, that VMware first introduced in 2013 and now is usually compared with Splunk.
Yesterday VMware announced Log Insight’s new major…

Release: Windows Server 2016 with support for Window Server & Hyper-V containers

October 13th, 2016

Yesterday Microsoft announced the general availability of Windows Server 2016 which the company defines as a cloud-ready OS.
Beside fancy definitions, one of the most relevant perks of this release…

Release: Oracle VM 3.4.2

September 22nd, 2016

During Oracle OpenWorld 2016 the company released version 3.4.2 of its enterprise virtualization solution.
Oracle VM is available for both x86 and SPARC based processor architectures and uses the Xen hypervisor…

VMworld US 2016 Wrap-up

September 1st, 2016

Today was the last day of VMware’s flagship conference VMworld in Las Vegas, an highly controversial edition which left a good chunk of the audience disoriented if not properly disappointed….

Gartner releases its Magic Quadrant for Cloud Infrastructure as a Service for 2016

August 11th, 2016

Last week Gartner updated its Magic Quadrant for Cloud Infrastructure as a Service (IaaS) for the year 2016. The Magic Quadrant for the year 2015 was released in May last year…

Release: Ansible Tower 3 by Red Hat

August 2nd, 2016

Ansible is one of the four main players in the automation market, younger then the well known Chef and Puppet, has been launched in 2013 in Durham, N.C. and acquired…

IBM announces earnings for Q2 2016

July 19th, 2016

Yesterday IBM announced its results for Q2 2016.

If we compare with the same quarter in 2015 earnings per share, from continuing operations, decreased 22%. Net income, from continuing operations,…

Monthly Archive