Friday, June 26, 2009

Prune - Find's tricky switch

After many thousands of uses, I finally got around to investigating find's exclusion parameters.  Showing its age, it basically consists of an arcane string of -path options combined with an -o for "or" followed by a -prune to indicate the need to exclude it from the search.

I was always sure there was some way to do it.  (With a regex, if nothing else.)  I've read the man page about 100 times and I've never understood how it worked till last week.  It works like a charm, in fact.

That being said, surely something closer to the du --exclude parameter is in order.  Features aren't very useful if people don't know they're there.

Thursday, June 18, 2009

ctime vs mtime

So about that huge file cleanup . . .  it seems that Netapp's ndmpcopy resets all the files' ctimes.  (Meaning there were changes to the inodes somewhere in the process).  Fortunately, the mtimes were preserved so I was able to generate a proper distribution-over-time stat.

I can only assume (since the guy who ran the ndmpcopy swears he didn't do anything else), that that means ndmpcopy is actually creating the inodes on the fly rather copying the existing ones over.  Good to know.  (rsync preserves those stamps the way I usually run it, so I've never run into this issue before.)


Tuesday, June 16, 2009

Multi-million-file Cleanup

I'm queuing up a series of scripts for tonight to go through the customer filesystems and clean out files older than a certain interval.  Under most circumstances, this would be a simple find and exec.  Once you get past about 400,000 files, though, you're forced to break the operation down into atomic tasks.

Essentially, I do a while-loop that descends to the level of customer directories and the performs the find within each one of them.  Normally, I would move the files to a trash directory at the root of the filesystem, but this is on the new netapps so a) this is mounted over NFS and moves would be extremely expensive and b) there's a built-in snapshotting capability in WAFL.  So I'll snap it first, then run the cleaner.  (On the plus side, I've been running this cleaner against our existing storage for years, so the basic algorithm is rock-solid.)

Monday, June 15, 2009

New Hosting Provider

I lost my last blog/troubleshooting matrix to a break-in at my last provider, so I'm giving JustHost a try.  It will be interesting to see how the domain transfer process has evolved over the last few years . . .