Tuesday, June 16, 2009

Multi-million-file Cleanup

I'm queuing up a series of scripts for tonight to go through the customer filesystems and clean out files older than a certain interval.  Under most circumstances, this would be a simple find and exec.  Once you get past about 400,000 files, though, you're forced to break the operation down into atomic tasks.

Essentially, I do a while-loop that descends to the level of customer directories and the performs the find within each one of them.  Normally, I would move the files to a trash directory at the root of the filesystem, but this is on the new netapps so a) this is mounted over NFS and moves would be extremely expensive and b) there's a built-in snapshotting capability in WAFL.  So I'll snap it first, then run the cleaner.  (On the plus side, I've been running this cleaner against our existing storage for years, so the basic algorithm is rock-solid.)

No comments:

Post a Comment