Wednesday, September 23, 2009

egrep - LOVE it

'egrep' is an alias for grep invoked with the '-e' switch to enable extended regular expressions.

The reason I especially love it is its ability to use inclusive "or's". In other words, give me all the lines with x, y or z.

So if you want to see a count of all the mails sent to your exim daemon from say, 5 to 8 pm on 9/22, you can simply execute the following:

egrep '(2009-09-22 17|2009-09-22 18|2009-09-22 19|2009-09-22 20)' mainlog.1 | grep dnslookup | wc -l

And voila! You will have your answer.

Monday, September 21, 2009

Reading Top on Multi-core Systems

Take this sample output:

top - 11:24:56 up 2 days, 11:19, 2 users, load average: 0.45, 0.34, 0.32
Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.1%us, 0.4%sy, 0.0%ni, 96.3%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8174036k total, 7693556k used, 480480k free, 260924k buffers
Swap: 2031608k total, 4k used, 2031604k free, 5043372k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26229 root 25 0 2181m 1.5g 96m S 28 18.7 133:33.65 java
3388 root 18 0 67692 2240 1732 S 0 0.0 0:01.95 rotatelogs
16890 nsmc 15 0 12736 1144 816 R 0 0.0 0:00.51 top

What we’re seeing seems completely contradictory - the java process is consuming 28 percent of the cpu's resources, but the total user process consumption on the system is 3 percent?

Basically, a rollup of CPU time on a multicore system is going to be an over-simplification. It’s essentially the total cpu of all processes divided by the number of cores. (In this case, eight).

Using the run-time command of ‘1’ (that’s the number one), you get a different output:

top - 11:28:25 up 2 days, 11:23, 2 users, load average: 0.14, 0.21, 0.27
Tasks: 174 total, 1 running, 173 sleeping, 0 stopped, 0 zombie
Cpu0 : 13.3%us, 1.3%sy, 0.0%ni, 84.1%id, 0.7%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu1 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.7%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8174036k total, 7754924k used, 419112k free, 261028k buffers
Swap: 2031608k total, 4k used, 2031604k free, 5104464k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26229 root 25 0 2181m 1.5g 96m S 20 18.7 134:23.33 java

That makes a LOT more sense. So the default output is showing the cpu consumption of the java process at the individual core level, but the rollup as an average across all the cores. This new output shows how it’s getting distributed across the cores.

By the same token, if you go back to the default output and hit ‘I’ (that’s a capital i), you will get a cpu consumption number for the java process divided by the number of cores:

top - 11:31:31 up 2 days, 11:26, 2 users, load average: 0.14, 0.22, 0.26
Tasks: 174 total, 1 running, 173 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.0%us, 0.5%sy, 0.0%ni, 96.0%id, 0.3%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 8174036k total, 7808624k used, 365412k free, 261136k buffers
Swap: 2031608k total, 4k used, 2031604k free, 5156848k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26229 root 25 0 2181m 1.5g 96m S 3.5 18.7 135:04.92 java
16890 nsmc 15 0 12736 1144 816 R 0.0 0.0 0:01.32 top
1 root 15 0 10344 680 572 S 0.0 0.0 0:02.47 init

So now you’re consistent again.

I’m not sure why these two are aren’t viewed from the same perspective by default.

Tuesday, August 25, 2009

IOPS

Input/Output Per Second (aka "IOPS") is a performance metric for hard disks that represents how many storage operations (reads/writes) the disk can execute in a given second. This, combined with how much actual data can be passed per operation, determines the throughput of the disk. The difference between the two is important since a 1 byte operation has the same IOP hit as a 1KB operation. (Assuming your drive can handle at least 1KB per operation.)

You can find the number of iops being used by your system via vmstat:

HOST::~$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 50432 388 1083600 0 0 407 290 1 0 1 5 88 6
0 0 0 49892 388 1084348 0 0 119 39 1295 1182 0 2 92 6
0 0 0 56856 388 1078296 0 0 324 99 1799 1972 0 4 86 10
0 0 0 56440 388 1078704 0 0 87 7 1673 1885 0 4 93 2
0 0 0 54880 388 1080472 0 0 320 32 1573 1518 0 2 89 8
0 0 0 53020 388 1082512 0 0 384 43 1556 1536 0 2 91 6
1 0 0 51568 388 1083736 0 0 232 45 1460 1453 0 1 90 8

Note that with vmstat, you can ignore the first line - it's a rollup average of stats since the last restart.

The stats you want to look at are io: bi and bo and cpu's wa. bi stands for "Blocks In" and bo for "Blocks Out". It's a bit counter-intuitive, though, in that they represent blocks moving into and out of the kernel's memory space, NOT into and out of the device. In fact, because of that distinction, writes to devices are actually represented by bo and reads from are represented by bi.

wa stands for "wait time" and tells you how much of your cpu time is spent waiting for I/O to complete before moving to the next instruction. If your wait time is high, your device I/O is getting choked - probably from insufficient IOPS capacity. You can track down which device it is using iostat, though figuring out which process is responsible is still a bit of an art. (I'm pretty sure the hooks for doing it are only just starting to make their way into the kernel.)

In a RAID environment, the number of IOPS is an aggregation of all the disks that are active within the RAID group. (It doesn't include any online spares that are sitting around idle.) This is essentially the speed limit for a group of disks. If you attempt to perform more I/O operations than your RAID can support, the device or host will have to start queueing them which will lead to dramatically increased CPU wait time.

The performance numbers for IOPS take a little digging to find so I wanted to drop them here for future reference:

SAS - Serial Attached SCSI - 175 for 15K, 125 10K
SATA - Serial ATA - 100 +/-
FC - Fibre Channel - 200 for 15K, 150 for 10K

Note that THESE NUMBERS WILL CHANGE. It represents the current state of things based on my interaction with vendors.

Multiply these numbers by the number of drives that are part of the active RAID and you'll know the theoretical upper boundary of your performance. (Which you can then compare to your vmstat to see where you're landing.)

A great article for more information on how to calculate IOPS/etc:

http://storagearchitect.blogspot.com/2008/09/how-many-iops.html

Friday, July 31, 2009

How to trace end-to-end connections on the Netscaler Load-balancer

HOW-TO: View Active Sessions End-To-End on the Netscalers
hide

First, the easiest way to see what the source IP's for traffic are is via the ASA firewall logs in syslog.

So to see where ftp connections are coming from, you could use something like

grep [ftp cluster vip] syslog | grep -v ICMP | grep -v [monitoring host's ip] | grep -v local-host


Which greps for the netscaler virtual ip of the ftp-cluster in the syslog file and filters out ICMP and monitoring-host traffic. (As well as connections to itself.)

Additionally, you can see all the active connections on a netscaler by ssh'ing to the CLI and running:

> show connectiontable | grep [ftp cluster vip]
[client ip] 45534 [ftp cluster vip] 21 FTP 7 TIME_WAIT
[client ip] 32570 [ftp cluster vip] 21 FTP 9 ESTABLISHED


That shows you where they're coming from. To find out where they're going to, also, you need to check the persistent connections:

> show persistence
Type SRC-IP DST-IP PORT VSNAME TIMEOUT REF_CNT
SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 103 1
SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 0 1
SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 75 0
SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 91 0


NOTE: This only works on vservers where persistence is handled by source-ip.

In the case of HTTP traffic, you can add a header to ip with the original ip, in the http traffic that hits the backend services.

There are also a number of Netscaler products allow you to do extensive log analysis.

Monday, July 27, 2009

Enabling Persistent Routes on a Debian Host

1. su to root
2. cd to /etc/network/
3. Copy off the interfaces file to interfaces.DATE (or what have you)
4. Add lines of the following form under the primary network interface definition:

up route add -net 10.1.1.0 netmask 255.255.255.0 gw 10.2.1.1
down route del -net 10.1.1.0 netmask 255.255.255.0

So you should end up with something like this:

iface bond0 inet static
address 10.2.1.5
netmask 255.255.255.0
network 10.2.1.0
gateway 10.2.1.1
up /sbin/ifenslave bond0 eth0
up /sbin/ifenslave bond0 eth2
up route add -net 10.1.1.0 netmask 255.255.255.0 gw 10.2.1.1
down route del -net 10.1.1.0 netmask 255.255.255.0


That creates a route to the 10.1.1.x network for the host with the ip 10.2.1.5 through the 10.2.1.1 router whenever the interface goes up. (It also removes it whenever the interface goes down.)

How to Disable and Clean Netapp Snapshots

To disable snapshots on a netapp volume, you need to disable snapshots on the volume:

vol options volume_name nosnap on


and disable the automatically scheduled snaps

snap sched volume_name 0 0 0


If you need to clear space from the snapshot volume, you can delete the old snaps.

Run

snap list volume_name


to find them, then

snap delete volume_name snap_name


to delete them.

Friday, July 17, 2009

How To Use Netapp SnapMirror

I. To Create a Snapmirror Relationship:

Create source and destination volumes of the same size that have a same-sized aggregate. (This is critical for being able to change the direction of the sync.)

Go into FilerView > Volumes on the DESTINATION and mark the volume OFFLINE.

Go into FilerView > SnapMirror > Add on the DESTINATION and proceed through accepting all the defaults except, obviously, the volume names.

On the SnapMirror > Manage screen, click the Advanced properties of the new job. Inside the job, click "Initialize". It will clean the target volume and begin the first sync. The sync will begin automatically on schedule which, if you used the defaults, is every minute.


II. To mark a Snapmirror RW:

End the SnapMirror relationship with the

snapmirror break

command. This command changes the destination's status from

snapmirrored


to

broken-off


thus making it writable.

When you're ready to resync them, run the

snapmirror resync


command on the DESTINATION. This will change a former destination's status back to snapmirrored and will resynchronize its contents with the source.

(NOTE: When applied to a former source, snapmirror resync can turn it into a mirror of the former destination. In this way, the roles of source and destination can be reversed.)

At any time, you can see the status of all the snapmirrors by running the

snapmirror list

command.