Tuesday, August 25, 2009

IOPS

Input/Output Per Second (aka "IOPS") is a performance metric for hard disks that represents how many storage operations (reads/writes) the disk can execute in a given second. This, combined with how much actual data can be passed per operation, determines the throughput of the disk. The difference between the two is important since a 1 byte operation has the same IOP hit as a 1KB operation. (Assuming your drive can handle at least 1KB per operation.)

You can find the number of iops being used by your system via vmstat:

HOST::~$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 50432 388 1083600 0 0 407 290 1 0 1 5 88 6
0 0 0 49892 388 1084348 0 0 119 39 1295 1182 0 2 92 6
0 0 0 56856 388 1078296 0 0 324 99 1799 1972 0 4 86 10
0 0 0 56440 388 1078704 0 0 87 7 1673 1885 0 4 93 2
0 0 0 54880 388 1080472 0 0 320 32 1573 1518 0 2 89 8
0 0 0 53020 388 1082512 0 0 384 43 1556 1536 0 2 91 6
1 0 0 51568 388 1083736 0 0 232 45 1460 1453 0 1 90 8

Note that with vmstat, you can ignore the first line - it's a rollup average of stats since the last restart.

The stats you want to look at are io: bi and bo and cpu's wa. bi stands for "Blocks In" and bo for "Blocks Out". It's a bit counter-intuitive, though, in that they represent blocks moving into and out of the kernel's memory space, NOT into and out of the device. In fact, because of that distinction, writes to devices are actually represented by bo and reads from are represented by bi.

wa stands for "wait time" and tells you how much of your cpu time is spent waiting for I/O to complete before moving to the next instruction. If your wait time is high, your device I/O is getting choked - probably from insufficient IOPS capacity. You can track down which device it is using iostat, though figuring out which process is responsible is still a bit of an art. (I'm pretty sure the hooks for doing it are only just starting to make their way into the kernel.)

In a RAID environment, the number of IOPS is an aggregation of all the disks that are active within the RAID group. (It doesn't include any online spares that are sitting around idle.) This is essentially the speed limit for a group of disks. If you attempt to perform more I/O operations than your RAID can support, the device or host will have to start queueing them which will lead to dramatically increased CPU wait time.

The performance numbers for IOPS take a little digging to find so I wanted to drop them here for future reference:

SAS - Serial Attached SCSI - 175 for 15K, 125 10K
SATA - Serial ATA - 100 +/-
FC - Fibre Channel - 200 for 15K, 150 for 10K

Note that THESE NUMBERS WILL CHANGE. It represents the current state of things based on my interaction with vendors.

Multiply these numbers by the number of drives that are part of the active RAID and you'll know the theoretical upper boundary of your performance. (Which you can then compare to your vmstat to see where you're landing.)

A great article for more information on how to calculate IOPS/etc:

http://storagearchitect.blogspot.com/2008/09/how-many-iops.html