Guest star authors: Ronald Oglesby, Director of Architecture-Virtualization Services, and Dan Pianfetti, Principal Consultant, at GlassHouse Technologies.
Patch Tuesday for VMware, sounds kind of silly doesn’t it? At least it did to us prior to doing some research on the patches coming out of VMware for ESX Server. This all started a few days ago when we started looking at a network issue some VMs were having. We then (after sorting through the available downloads/patches, and talking to support) found there was a patch for this issue.
Nice. Great. Why wasn’t this installed? Too many patches? Admins don’t think they need them?
Whatever the reason it is starting to become a trend in some ESX environments; not all patches are installed by the admins. The reason for this is pretty simple; we already have patch Tuesday for Microsoft Servers we are dealing with, patches for applications that app owners install, SQL, Exchange, etc patches and of course desktops patching. Sorting through ESX patches is often a secondary job for Windows administrators tasked with maintain ESX, and if ESX is working, patching it, falls to the bottom of the pile. I mean this is VMware’s ESX server! The product that we used to tell people didn’t need patching that often since there wasn’t much code to have to patch. But recently we have started to notice a change, and have had to stop telling people that patches for ESX were few and far between.
To be rational about our assertion we started by looking at the available data on patches for ESX. We couldn’t get data all the way back to ESX 1.5 since VMware’s site has been revamped several times and those patches are not available, and quite honestly who saves patches all the way back to 2003/4 anyway. But, what we found in the data was pretty telling. The first item we noticed was sheer number of patches for ESX 3.0.1: 68! Sixty-Eight patches in the course of about a year. Of course they were released in about 11 groups, at an average of about 7 patches per release date (per the VMware website).
Of those 68 patches; 17 were considered Critical patches (an average of 1.4 per release), 21 were security related (average of 1.75 per release) and 30 General patches averaging 2.5 patches per release date. The other thing we noticed (besides the number of patches) was the frequency at which patches were released. Essentially the time between patches / release dates continues to shrink.
The chart above shows the average number of calendar days between patches by version of ESX Server. If you are an ESX expert, you will note some minor versions of ESX that were not widely adopted or had a small number of fixes, have been filtered from this list. The other thing to notice is the red normalized line. This normalized line is used ONLY for 3.0.0 and 3.0.1. After 3.0.0 was released there wasn’t a patch available for about 100 days. We believe this is due to the slow adoption of 3.0.0 at first release and the normalized line only takes into account time between patches after the release of the first patch for that OS.
So why make this chart and look at the time between patches? Let’s take a hypothetical server built on July 2nd of 2007, 5 months ago almost exactly. Since being built on that day and put into production that server would have been put into maintenance mode and patched/updated eight times. That’s right eight (8) times in 5 months. How did this happen? Let’s look at the following timeline:
Wow huh? This server has been put into maintenance mode on an average of every 19 calendar days (less than three weeks) over 5 months… Now expand that to an environment with a couple of 10 node clusters?
At this point, some readers may point out that the general patches may not be needed by all implementations. This may be one reason VMware has separated the patches instead of releasing one big patch/update on each release date containing all the fixes. While it is true that not ALL general patches are needed, most are. If you look at some of the general patches for 3.0.1 or 3.0.2 you will see that they affect some of the basic components of ESX that everyone uses or contain fixes for common use components like iSCSI updates, updates to the e1000 driver, a fix for time gains in Windows etc. So these general patches cannot be ignored in most environments, and if you have failed to install one (like the Windows Time issue fix let’s say) and then experience the problem, it is your head on the chopping block for not patching and keeping up to date.
I guess the point of this article is to wonder what is behind the increase in the number and frequency of patches for ESX. As we stated earlier, we used to tell clients that this (ESX) was a piece of infrastructure, with very few moving parts and therefore very few patches when compared to Windows, and can generally be treated like an appliance. The issue we now see is that in VMware’s quest to support more hardware, add more features, and keep MS at bay with their advanced technology, they seem to be focusing more on “which whiz-bang can we put in today”, rather than “how can we make this the most stable enterprise platform available?” I mean at what point did we get rid of the idea of a “small” hypervisor and not something stuffed so full of components that need to be patched every 18.75 days (in the case of the example server).
We are not here to beat VMware over the head for patching/updating their product. Obviously if something is broken it needs to be fixed. Instead we are wondering where their focus is and point out a larger problem in the virtualization world. Companies are moving unbelievably fast in an attempt to create new features, stay ahead of the game and basically be the leader in whatever virtualization niche they are in. But at what cost? And is it worth it to the client? If a client is buying into the idea of server virtualization as a piece of infrastructure (like a SAN or a switch) only to see the types of patching we see in Windows, they are going to get smacked in the face with the reality that these are SERVERS. The reality that the vendors are sticking so much into the OS that patches are going to happen just as often as with Windows Servers… Or, if the client believes the stability/rock solidness and skips a majority of general patches, they wind up with goofy time issues or other problems with iSCSI, until they catch up.
VMware, the largest player in the game, seems to be moving at such a fast pace that they are soon going to need a Patch Tuesday (kind of like MS). Patch Tuesday wasn’t invented because people hate Mondays and needed a reason to hate Tuesdays. Patch Tuesday was needed because patches just came out randomly from different groups and different times, requiring numerous resources to constantly review patches and implement them. Instead they release the patches all at once, and Windows admins can simply slam them all down at simultaneously. Sooner or later (if the trend continues) we may need to do the same thing for ESX and I’ll bet VMware is seeing the same thing. Notice how patching tools are in the works for ESX (and some pieces are already available in the OS)? And third party tools are already available to attempt to make it easier for Windows Admins trying to keep up with their ESX environment.
Maybe it’s time to slow down and look at this as a QA issue? Maybe it’s time to stop thinking about these platforms as rock solid, few moving parts systems? Maybe it’s better for us not to draw attention to it, and instead let it play out and the markets decide whether all this patching is a good thing or not. Obviously patching is a necessary evil, and maybe because we are so used to it in the Windows world, we have ignored this so far. But a patch every 18.75 days for our “hypothetical” server is a bit much, don’t you think?
About the authors
Ron Oglesby is the Director of Architecture-Virtualization Services at GlassHouse Technologies and the co-author of ESX Server – Advanced Technical Design Guide and VMware Virtual Infrastructure 3- Advanced Technical Design Guide.
Dan Pianfetti is a Principal Consultant at GlassHouse Technologies and specializes in VMware implementations in enterprise environments.
Update: VMware answered this post on its own corporate blog.