Tech: Finding a Better Way to Estimate IOPS for VDI

Paul Wilson posted on the Citrix blog an lengthy article sharing his insight in how to better estimate IOPS for VDI solutions. He states though that the only way to prevent under-sizing in the storage tier is by running a pilot and analyzing its results.

For estimation you would normally take the average IOPS of all users, which you multiply by the number of users in order to determine the storage requirements, but that is not sufficient Wilson states.

Currently IOPS are estimated by taking into account so called boot storms, (when the machines start up) and login storms (when the users logon). He states that in the ideal world you would record these IOPS and use the highest value for your IOPS estimation, basically the same approach as Citrix used in their paper: XenDesktop 4.0 planning guide for hosted VM-Based resource allocation. This model is the safest approach but most probably not the most cost effective, which will drive the ROI of VDI down significantly. A better approach in this case would be to take the average IOPS for the user workload and add 10 to 20% extra for a buffer. Translated into a formula this would look something like this:

Login IOPS = MaxSimultaneousUsers * Average Login IOPS (Incremental IOPS for new user logons)

Workload IOPS = MaxSimultanousUsers * Average Workload IOPS (IOPS when all users are online)

Peak IOPS = Workload IOPS + Login IOPS (Theoretical maximum when all users are online and the last set login)

SAN Capacity = Peak IOPS + 20% buffer

His experience when using the precedence way of estimating was that this way of estimating doesn’t work at all, resulting in the fact that during his projects he had to order more storage so the required IOPS could be provided.

Based on that experience Wilson decided to create a new model, taking into account his experiences from the field, making the following assumptions:

  • Desktops are in either a login state or a workload state.
  • Desktops start in the login state and move to the workload state based on the Login Time parameter.
  • Desktops enter the login state at the rate defined by the Launch Rate parameter.
  • Desktops in the login state and desktops in a workload state have different IOPS requirements.
  • Read IOPS are ignored because they amounted to less than 2% of the total IOPS.

His model uses several variables:

  • Number of Desktops, noting that when using multiple clusters you should determine the value per cluster.
  • Launch Rate, which details how quickly users will be logging into the system, expressed in the amount of users that login per second.
  • Login IOPS, detailing the amount of IOPS during login of a user.
  • Workload IOPS, detailing the amount of IOPS expected per user during the workload execution.
  • Desktop Login Time, the amount of time it takes a user to login to the desktop, measured from CtrlAltDel to shell initialization complete.

Resulting in the following formulas:

  • Peak IOPS, which represent the peak IOPS expected:

Peak IOPS = MAX((DesktopsInLoginState * Login IOPS) + (DesktopsInWorkloadState * Workload IOPS)

  • Steady-State IOPS, which represent the estimated IOPS during normal workload

Steady-State IOPS = Number of Desktops * Workload IOPS

  • Estimated Boot IOPS, which represents the estimated IOPS required to boot all the machines, making a note that the confidence level of this formula is the lowest one, because it is based on a correlation that may not exist in the future, or with other hypervisors.

Estimated Boot IOPS = Number of Desktops * 22

clip_image001

The Workload IOPS depend on the type of users you have within your VDI environment, these users can be divided in:

  • Light user: ~6 IOPS per concurrent user. This user is working in a single application and is not browsing the web.
  • Normal user: ~10 IOPS per concurrent user. This user is probably working in a few applications with minimal web browsing.
  • Power user: ~25 IOPS per concurrent user. This user usually runs multiple applications concurrently and spends considerable time browsing the web.
  • Heavy user: ~ 50 IOPS per concurrent user. This user is busy doing tasks that have high I/O requirements like compiling code or working with images or video.

Using the amount of these users, you can determine a loaded rate for the environment, for instance when you have 20% Light, 50% Normal, 20% Power and 10% Heavy users:

Loading IOPS = Light (.20*6) + Normal (.5*10) + Power (.2*25) + Heavy (.1*50) = 16.2

If you do have the ability to perform a pilot, that’s still preferred though, so you can analyze the actual IOPS. This is done by determining the peak average IOPS, by looking at the average IOPS for each user during the pilot and select the highest value for calculations, resulting in the following formula:

Peak Average IOPS = MAX (AvgIOPSuser1, AvgIOPSuser2, … AvgIOPSuserN)

You should also take time to understand the limitations of the to be used storage environment. You can do this by reviewing its capacity used when boot and login storms occur. Using the following formulas you can determine boot and login storm size.

Boot Storm Size = Total IOPS available / 300

Login Storm Size = Total IOPS available / 100

Wilson notes that there are many vendors which can increase the performance of the storage tier, sometimes at a fraction of the cost of purchasing that same performance from the SAN vendor.