IBM publishes a study on Amazon EC2 support issues and response time

A few days ago IBM published a very interesting study about the support model used by Amazon for its Infrastructure-as-a-Service cloud platform: EC2.

The 7-pages research summarizes the analysis of the Amazon support forums between August 2006 and December 2009, and highlights how the cloud provider is having issues in dealing with a lot of troubleshooting:

…We found that, with the exception of problems related to application-level issues, the observed problems are roughly evenly divided among the remaining problem classes (e.g., connectivity, virtual image management, performance, etc.). In studying the evolution of problems over time, we find that some classes of problems are closely related to the introduction of new cloud features as users incorporate them into their deployments. Other problems, such as those related to connectivity, are persistent and are relatively less affected by new functionality. We see evidence that some problem classes (e.g., related to image management) diminish significantly as cloud providers introduce new interfaces and tooling to simplify certain operations. The level of involvement from the cloud operator in solving problems also is dependent on the problem class, and changes over time. Some classes, such as those problems related to virtual infrastructure components, require operator involvement 50% of the time. While operator involvement in solving problems does generally decrease over time, some problems consistently require help from the cloud operator…

AmazonEC2_SupportIssues

Our key observations are: (i) Users encounter many problems in trying to setup their instance and to keep the instance running. (ii) Of the many type of problems faced by the users, we observe that those related to managing virtual resources and instance performance grow over time. In addition, we find that these two types of problems require the most involvement from cloud administrators because users are ill-equipped with the appropriate tools to debug these two classes. (iii) The addition of new features results in a temporary increase in the number of problems reported – the number of problems subside as user become familiar and the provider perfects the method of delivery for these features. (iv) Our investigation of the cloud support model shows that the support staff grows in proportion to the increasing number of customers posting in the forum. Administrators usually respond to posts within 10-12 hours, but problem resolution can take days.

Thanks to Network World for the news.