I have also had slow scrubbing on filesystems with lots of files, and I
agree that it does seem to degrade badly. For me, it seemed to go from 24
hours to 72 hours in a matter of a few weeks.

I did these things on a pool in-place, which helped a lot (no rebuilding):
1. reduced number of snapshots (auto snapshots can generate a lot of files).

2. disabled compression and rebuilt affected datasets (is compression on?)
3. upgraded to b129, which has metadata prefetch for scrub, seems to help by
~2x?
4. tar'd up some extremely large folders
5. added 50% more RAM.
6. turned off atime

My scrubs went from 80 hours to 12 with these changes. (4TB used, ~10M files
+ 10 snapshots each.)

I haven't figured out if "disable compression" vs. "fewer snapshots/files
and more RAM" made a bigger difference. I'm assuming that once the number of
files exceeds ARC, you get dramatically lower performance, and maybe that
compression has some additional overhead, but I don't know, this is just
what worked.

It would be nice to have a benchmark set for features like this & general
recommendations for RAM/ARC size, based on number of files, etc. How does
ARC usage scale with snapshots? Scrub on a huge maildir machine seems like
it would make a nice benchmark.

I used "zdb -d pool" to figure out which filesystems had a lot of objects,
and figured out places to trim based on that.

mike

On Tue, Dec 15, 2009 at 6:41 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Tue, 15 Dec 2009, Bill Sprouse wrote:
>
>  Hi Everyone,
>>
>> I hope this is the right forum for this question.  A customer is using a
>> Thumper as an NFS file server to provide the mail store for multiple email
>> servers (Dovecot).  They find that when a zpool is freshly created and
>>
>
> It seems that Dovecot's speed optimizations for mbox format are specially
> designed to break zfs
>
>  "
> http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations
> "
>
> and explains why using a tiny 8k recordsize temporarily "improved"
> performance.  Tiny updates seem to be abnormal for a mail server. The many
> tiny updates combined with zfs COW conspire to spread the data around the
> disk, requiring a seek for each 8k of data.  If more data was written at
> once, and much larger blocks were used, then the filesystem would continue
> to perform much better, although perhaps less well initially.  If the system
> has sufficient RAM, or a large enough L2ARC, then Dovecot's optimizations to
> diminish reads become meaningless.
>
>
>  Is this expected behavior given the application (email - small, random
>> writes/reads)?  Are there recommendations for system/ZFS/NFS configurations
>> to improve this sort of thing?  Are there best practices for structuring
>> backups to avoid a directory walk?
>>
>
> Zfs works best when whole files are re-written rather than updated in place
> as Dovecot seems to want to do.  Either the user mailboxes should be
> re-written entirely when they are "expunged" or else a different mail
> storage format which writes entire files, or much larger records, should be
> used.
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to