Better Living Through Unix: 2009

Monday, November 2, 2009

Formatting Tool Output for Reports

If you're like me, you use cron to mail you a variety of stats generated during off-peak ours.

You may, at some point, want to start extrapolating patterns from the data you've collected. In my experience, nothing really gets the job done in this scenario quite like a spreadsheet.

That being said, the data generated by most tools is reader-friendly, not import-friendly. Therefore, they must be munged in order to add them to your data set.

Most of them use tabs to do their spacing/formatting. I have written a quick sed hack that removes one or more instances of a tab from a line and converts it to a comma. The end result is essentially a CSV.

So, to format the output of a

du -sm /home/*

you could pipe it through sed and into a mail command.

Ie:

du -sm /home/* | sed -e '/txt/d;s/\t\t*/,/' | mail -s "Usage" nathan@mccourtney.com

the '-e' lets it know that there is more than one command being passed - separated by a ';'.

In this case, I'm using the first part of the command to delete every line with 'txt' in it, since I'm really not worried about the text files. The second part matches at least one tab and all the tabs following it, then replaces them with a single ','. The output it then piped to a mail command and sent to my account.

When it shows up, the output goes from

nsmc@bangkok:~$ du -sm /home/*

1786 /home/nathan

1786,/home/nsmc

It's not terribly pretty, but you can easily import it into a spreadsheet. And really, I use a slightly more complex du script that generates a more readable listing before it even gets to sed. This was intended just to show the sed side of it.

Tuesday, October 20, 2009

ulimit - Runtime system resource tuning

When you're dealing with daemons, you're likely to run into the system's process-level constraints. In additional to the kernel params that are set in the setup scripts, there are process limits you need to be aware of to keep from bumping your head.

The shell has a built-in command to query and set those values: ulimit.

ulimit: usage: ulimit [-SHacdfilmnpqstuvx] [limit]

brutus:~ nsmc$ ulimit -a

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

file size (blocks, -f) unlimited

max locked memory (kbytes, -l) unlimited

max memory size (kbytes, -m) unlimited

open files (-n) 256

pipe size (512 bytes, -p) 1

stack size (kbytes, -s) 8192

cpu time (seconds, -t) unlimited

max user processes (-u) 266

virtual memory (kbytes, -v) unlimited

As you can see, there are very tight limits on this system for the number of processes a user can run, as well as the number of files any of those processes can have open.

These values can be set using a basic syntax:

#ulimit -n 9162

However, they won't survive a reboot. Since these limits are tied to the shell process that invokes them, you need to set them in /etc/bashrc (or whatever your shell takes its initialization parameters from) in order to have them come up automatically.

Wednesday, September 23, 2009

egrep - LOVE it

'egrep' is an alias for grep invoked with the '-e' switch to enable extended regular expressions.

The reason I especially love it is its ability to use inclusive "or's". In other words, give me all the lines with x, y or z.

So if you want to see a count of all the mails sent to your exim daemon from say, 5 to 8 pm on 9/22, you can simply execute the following:

egrep '(2009-09-22 17|2009-09-22 18|2009-09-22 19|2009-09-22 20)' mainlog.1 | grep dnslookup | wc -l

And voila! You will have your answer.

Monday, September 21, 2009

Reading Top on Multi-core Systems

Take this sample output:

top - 11:24:56 up 2 days, 11:19, 2 users, load average: 0.45, 0.34, 0.32
Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.1%us, 0.4%sy, 0.0%ni, 96.3%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8174036k total, 7693556k used, 480480k free, 260924k buffers
Swap: 2031608k total, 4k used, 2031604k free, 5043372k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26229 root 25 0 2181m 1.5g 96m S 28 18.7 133:33.65 java
3388 root 18 0 67692 2240 1732 S 0 0.0 0:01.95 rotatelogs
16890 nsmc 15 0 12736 1144 816 R 0 0.0 0:00.51 top

What we’re seeing seems completely contradictory - the java process is consuming 28 percent of the cpu's resources, but the total user process consumption on the system is 3 percent?

Basically, a rollup of CPU time on a multicore system is going to be an over-simplification. It’s essentially the total cpu of all processes divided by the number of cores. (In this case, eight).

Using the run-time command of ‘1’ (that’s the number one), you get a different output:

top - 11:28:25 up 2 days, 11:23, 2 users, load average: 0.14, 0.21, 0.27
Tasks: 174 total, 1 running, 173 sleeping, 0 stopped, 0 zombie
Cpu0 : 13.3%us, 1.3%sy, 0.0%ni, 84.1%id, 0.7%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu1 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.7%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8174036k total, 7754924k used, 419112k free, 261028k buffers
Swap: 2031608k total, 4k used, 2031604k free, 5104464k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26229 root 25 0 2181m 1.5g 96m S 20 18.7 134:23.33 java

That makes a LOT more sense. So the default output is showing the cpu consumption of the java process at the individual core level, but the rollup as an average across all the cores. This new output shows how it’s getting distributed across the cores.

By the same token, if you go back to the default output and hit ‘I’ (that’s a capital i), you will get a cpu consumption number for the java process divided by the number of cores:

top - 11:31:31 up 2 days, 11:26, 2 users, load average: 0.14, 0.22, 0.26
Tasks: 174 total, 1 running, 173 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.0%us, 0.5%sy, 0.0%ni, 96.0%id, 0.3%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 8174036k total, 7808624k used, 365412k free, 261136k buffers
Swap: 2031608k total, 4k used, 2031604k free, 5156848k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26229 root 25 0 2181m 1.5g 96m S 3.5 18.7 135:04.92 java
16890 nsmc 15 0 12736 1144 816 R 0.0 0.0 0:01.32 top
1 root 15 0 10344 680 572 S 0.0 0.0 0:02.47 init

So now you’re consistent again.

I’m not sure why these two are aren’t viewed from the same perspective by default.

Tuesday, August 25, 2009

IOPS

Input/Output Per Second (aka "IOPS") is a performance metric for hard disks that represents how many storage operations (reads/writes) the disk can execute in a given second. This, combined with how much actual data can be passed per operation, determines the throughput of the disk. The difference between the two is important since a 1 byte operation has the same IOP hit as a 1KB operation. (Assuming your drive can handle at least 1KB per operation.)

You can find the number of iops being used by your system via vmstat:

HOST::~$ vmstat 5

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 0 50432 388 1083600 0 0 407 290 1 0 1 5 88 6

0 0 0 49892 388 1084348 0 0 119 39 1295 1182 0 2 92 6

0 0 0 56856 388 1078296 0 0 324 99 1799 1972 0 4 86 10

0 0 0 56440 388 1078704 0 0 87 7 1673 1885 0 4 93 2

0 0 0 54880 388 1080472 0 0 320 32 1573 1518 0 2 89 8

0 0 0 53020 388 1082512 0 0 384 43 1556 1536 0 2 91 6

1 0 0 51568 388 1083736 0 0 232 45 1460 1453 0 1 90 8

Note that with vmstat, you can ignore the first line - it's a rollup average of stats since the last restart.

The stats you want to look at are io: bi and bo and cpu's wa. bi stands for "Blocks In" and bo for "Blocks Out". It's a bit counter-intuitive, though, in that they represent blocks moving into and out of the kernel's memory space, NOT into and out of the device. In fact, because of that distinction, writes to devices are actually represented by bo and reads from are represented by bi.

wa stands for "wait time" and tells you how much of your cpu time is spent waiting for I/O to complete before moving to the next instruction. If your wait time is high, your device I/O is getting choked - probably from insufficient IOPS capacity. You can track down which device it is using iostat, though figuring out which process is responsible is still a bit of an art. (I'm pretty sure the hooks for doing it are only just starting to make their way into the kernel.)

In a RAID environment, the number of IOPS is an aggregation of all the disks that are active within the RAID group. (It doesn't include any online spares that are sitting around idle.) This is essentially the speed limit for a group of disks. If you attempt to perform more I/O operations than your RAID can support, the device or host will have to start queueing them which will lead to dramatically increased CPU wait time.

The performance numbers for IOPS take a little digging to find so I wanted to drop them here for future reference:

SAS - Serial Attached SCSI - 175 for 15K, 125 10K

SATA - Serial ATA - 100 +/-

FC - Fibre Channel - 200 for 15K, 150 for 10K

Note that THESE NUMBERS WILL CHANGE. It represents the current state of things based on my interaction with vendors.

Multiply these numbers by the number of drives that are part of the active RAID and you'll know the theoretical upper boundary of your performance. (Which you can then compare to your vmstat to see where you're landing.)

A great article for more information on how to calculate IOPS/etc:

http://storagearchitect.blogspot.com/2008/09/how-many-iops.html

Friday, July 31, 2009

How to trace end-to-end connections on the Netscaler Load-balancer

HOW-TO: View Active Sessions End-To-End on the Netscalers

hide

First, the easiest way to see what the source IP's for traffic are is via the ASA firewall logs in syslog.

So to see where ftp connections are coming from, you could use something like

grep [ftp cluster vip] syslog | grep -v ICMP | grep -v [monitoring host's ip] | grep -v local-host

Which greps for the netscaler virtual ip of the ftp-cluster in the syslog file and filters out ICMP and monitoring-host traffic. (As well as connections to itself.)

Additionally, you can see all the active connections on a netscaler by ssh'ing to the CLI and running:

> show connectiontable | grep [ftp cluster vip]

[client ip] 45534 [ftp cluster vip] 21 FTP 7 TIME_WAIT

[client ip] 32570 [ftp cluster vip] 21 FTP 9 ESTABLISHED

That shows you where they're coming from. To find out where they're going to, also, you need to check the persistent connections:

> show persistence

Type SRC-IP DST-IP PORT VSNAME TIMEOUT REF_CNT

SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 103 1

SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 0 1

SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 75 0

SOURCEIP [client ip] [ftp server ip] 21 ftp_cluster 91 0

NOTE: This only works on vservers where persistence is handled by source-ip.

In the case of HTTP traffic, you can add a header to ip with the original ip, in the http traffic that hits the backend services.

There are also a number of Netscaler products allow you to do extensive log analysis.

Monday, July 27, 2009

Enabling Persistent Routes on a Debian Host

1. su to root

2. cd to /etc/network/

3. Copy off the interfaces file to interfaces.DATE (or what have you)

4. Add lines of the following form under the primary network interface definition:

up route add -net 10.1.1.0 netmask 255.255.255.0 gw 10.2.1.1

down route del -net 10.1.1.0 netmask 255.255.255.0

So you should end up with something like this:

iface bond0 inet static

address 10.2.1.5

netmask 255.255.255.0

network 10.2.1.0

gateway 10.2.1.1

up /sbin/ifenslave bond0 eth0

up /sbin/ifenslave bond0 eth2

up route add -net 10.1.1.0 netmask 255.255.255.0 gw 10.2.1.1

down route del -net 10.1.1.0 netmask 255.255.255.0

That creates a route to the 10.1.1.x network for the host with the ip 10.2.1.5 through the 10.2.1.1 router whenever the interface goes up. (It also removes it whenever the interface goes down.)

How to Disable and Clean Netapp Snapshots

To disable snapshots on a netapp volume, you need to disable snapshots on the volume:

vol options volume_name nosnap on

and disable the automatically scheduled snaps

snap sched volume_name 0 0 0

If you need to clear space from the snapshot volume, you can delete the old snaps.

Run

snap list volume_name

to find them, then

snap delete volume_name snap_name

to delete them.

Friday, July 17, 2009

How To Use Netapp SnapMirror

I. To Create a Snapmirror Relationship:

Create source and destination volumes of the same size that have a same-sized aggregate. (This is critical for being able to change the direction of the sync.)

Go into FilerView > Volumes on the DESTINATION and mark the volume OFFLINE.

Go into FilerView > SnapMirror > Add on the DESTINATION and proceed through accepting all the defaults except, obviously, the volume names.

On the SnapMirror > Manage screen, click the Advanced properties of the new job. Inside the job, click "Initialize". It will clean the target volume and begin the first sync. The sync will begin automatically on schedule which, if you used the defaults, is every minute.

II. To mark a Snapmirror RW:

End the SnapMirror relationship with the

snapmirror break

command. This command changes the destination's status from

snapmirrored

broken-off

thus making it writable.

When you're ready to resync them, run the

snapmirror resync

command on the DESTINATION. This will change a former destination's status back to snapmirrored and will resynchronize its contents with the source.

(NOTE: When applied to a former source, snapmirror resync can turn it into a mirror of the former destination. In this way, the roles of source and destination can be reversed.)

At any time, you can see the status of all the snapmirrors by running the

snapmirror list

command.

Thursday, July 2, 2009

Bookpool is gone!

How terribly, terribly sad! I noticed their inventory had been pretty lean for a while there, but this was definitely the go-to site for at least half the books in my collection.

I guess the web giveth and the web taketh away.

Here's hoping a new store rises from the ashes . . .

Friday, June 26, 2009

Prune - Find's tricky switch

After many thousands of uses, I finally got around to investigating find's exclusion parameters. Showing its age, it basically consists of an arcane string of -path options combined with an -o for "or" followed by a -prune to indicate the need to exclude it from the search.

I was always sure there was some way to do it. (With a regex, if nothing else.) I've read the man page about 100 times and I've never understood how it worked till last week. It works like a charm, in fact.

That being said, surely something closer to the du --exclude parameter is in order. Features aren't very useful if people don't know they're there.

Thursday, June 18, 2009

ctime vs mtime

So about that huge file cleanup . . . it seems that Netapp's ndmpcopy resets all the files' ctimes. (Meaning there were changes to the inodes somewhere in the process). Fortunately, the mtimes were preserved so I was able to generate a proper distribution-over-time stat.

I can only assume (since the guy who ran the ndmpcopy swears he didn't do anything else), that that means ndmpcopy is actually creating the inodes on the fly rather copying the existing ones over. Good to know. (rsync preserves those stamps the way I usually run it, so I've never run into this issue before.)

Tuesday, June 16, 2009

Multi-million-file Cleanup

I'm queuing up a series of scripts for tonight to go through the customer filesystems and clean out files older than a certain interval. Under most circumstances, this would be a simple find and exec. Once you get past about 400,000 files, though, you're forced to break the operation down into atomic tasks.

Essentially, I do a while-loop that descends to the level of customer directories and the performs the find within each one of them. Normally, I would move the files to a trash directory at the root of the filesystem, but this is on the new netapps so a) this is mounted over NFS and moves would be extremely expensive and b) there's a built-in snapshotting capability in WAFL. So I'll snap it first, then run the cleaner. (On the plus side, I've been running this cleaner against our existing storage for years, so the basic algorithm is rock-solid.)

Monday, June 15, 2009

New Hosting Provider

I lost my last blog/troubleshooting matrix to a break-in at my last provider, so I'm giving JustHost a try. It will be interesting to see how the domain transfer process has evolved over the last few years . . .

Better Living Through Unix