I have also had slow scrubbing on filesystems with lots of files, and I agree that it does seem to degrade badly. For me, it seemed to go from 24 hours to 72 hours in a matter of a few weeks.
I did these things on a pool in-place, which helped a lot (no rebuilding): 1. reduced number of snapshots (auto snapshots can generate a lot of files). 2. disabled compression and rebuilt affected datasets (is compression on?) 3. upgraded to b129, which has metadata prefetch for scrub, seems to help by ~2x? 4. tar'd up some extremely large folders 5. added 50% more RAM. 6. turned off atime My scrubs went from 80 hours to 12 with these changes. (4TB used, ~10M files + 10 snapshots each.) I haven't figured out if "disable compression" vs. "fewer snapshots/files and more RAM" made a bigger difference. I'm assuming that once the number of files exceeds ARC, you get dramatically lower performance, and maybe that compression has some additional overhead, but I don't know, this is just what worked. It would be nice to have a benchmark set for features like this & general recommendations for RAM/ARC size, based on number of files, etc. How does ARC usage scale with snapshots? Scrub on a huge maildir machine seems like it would make a nice benchmark. I used "zdb -d pool" to figure out which filesystems had a lot of objects, and figured out places to trim based on that. mike On Tue, Dec 15, 2009 at 6:41 PM, Bob Friesenhahn < bfrie...@simple.dallas.tx.us> wrote: > On Tue, 15 Dec 2009, Bill Sprouse wrote: > > Hi Everyone, >> >> I hope this is the right forum for this question. A customer is using a >> Thumper as an NFS file server to provide the mail store for multiple email >> servers (Dovecot). They find that when a zpool is freshly created and >> > > It seems that Dovecot's speed optimizations for mbox format are specially > designed to break zfs > > " > http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations > " > > and explains why using a tiny 8k recordsize temporarily "improved" > performance. Tiny updates seem to be abnormal for a mail server. The many > tiny updates combined with zfs COW conspire to spread the data around the > disk, requiring a seek for each 8k of data. If more data was written at > once, and much larger blocks were used, then the filesystem would continue > to perform much better, although perhaps less well initially. If the system > has sufficient RAM, or a large enough L2ARC, then Dovecot's optimizations to > diminish reads become meaningless. > > > Is this expected behavior given the application (email - small, random >> writes/reads)? Are there recommendations for system/ZFS/NFS configurations >> to improve this sort of thing? Are there best practices for structuring >> backups to avoid a directory walk? >> > > Zfs works best when whole files are re-written rather than updated in place > as Dovecot seems to want to do. Either the user mailboxes should be > re-written entirely when they are "expunged" or else a different mail > storage format which writes entire files, or much larger records, should be > used. > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss