Thanks MIchael,

Useful stuff to try. I wish we could add more memory, but the x4500 is limited to 16GB. Compression was a question. Its currently off, but they were thinking of turning it on.

bill

On Dec 15, 2009, at 7:02 PM, Michael Herf wrote:

I have also had slow scrubbing on filesystems with lots of files, and I agree that it does seem to degrade badly. For me, it seemed to go from 24 hours to 72 hours in a matter of a few weeks.

I did these things on a pool in-place, which helped a lot (no rebuilding): 1. reduced number of snapshots (auto snapshots can generate a lot of files). 2. disabled compression and rebuilt affected datasets (is compression on?) 3. upgraded to b129, which has metadata prefetch for scrub, seems to help by ~2x?
4. tar'd up some extremely large folders
5. added 50% more RAM.
6. turned off atime

My scrubs went from 80 hours to 12 with these changes. (4TB used, ~10M files + 10 snapshots each.)

I haven't figured out if "disable compression" vs. "fewer snapshots/ files and more RAM" made a bigger difference. I'm assuming that once the number of files exceeds ARC, you get dramatically lower performance, and maybe that compression has some additional overhead, but I don't know, this is just what worked.

It would be nice to have a benchmark set for features like this & general recommendations for RAM/ARC size, based on number of files, etc. How does ARC usage scale with snapshots? Scrub on a huge maildir machine seems like it would make a nice benchmark.

I used "zdb -d pool" to figure out which filesystems had a lot of objects, and figured out places to trim based on that.

mike

On Tue, Dec 15, 2009 at 6:41 PM, Bob Friesenhahn <bfrie...@simple.dallas.tx.us > wrote:
On Tue, 15 Dec 2009, Bill Sprouse wrote:

Hi Everyone,

I hope this is the right forum for this question. A customer is using a Thumper as an NFS file server to provide the mail store for multiple email servers (Dovecot). They find that when a zpool is freshly created and

It seems that Dovecot's speed optimizations for mbox format are specially designed to break zfs

"http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations "

and explains why using a tiny 8k recordsize temporarily "improved" performance. Tiny updates seem to be abnormal for a mail server. The many tiny updates combined with zfs COW conspire to spread the data around the disk, requiring a seek for each 8k of data. If more data was written at once, and much larger blocks were used, then the filesystem would continue to perform much better, although perhaps less well initially. If the system has sufficient RAM, or a large enough L2ARC, then Dovecot's optimizations to diminish reads become meaningless.


Is this expected behavior given the application (email - small, random writes/reads)? Are there recommendations for system/ZFS/NFS configurations to improve this sort of thing? Are there best practices for structuring backups to avoid a directory walk?

Zfs works best when whole files are re-written rather than updated in place as Dovecot seems to want to do. Either the user mailboxes should be re-written entirely when they are "expunged" or else a different mail storage format which writes entire files, or much larger records, should be used.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to