Hi,

This page may indicate the root cause.
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle

ZFS will throttle the write speed to match the write speed to the txg to the
speed of DISK IO. If it detects the modest measure(1 tick pause) cannot
prevent the tx group from being too large, it adopts a way to stall all
write request. That could be the situation you have observed.

However, please be notice, this is may not correct since I'm not  a
developer of ZFS.

For a workaround, you may add more disk to ZFS pool to get more bandwidth to
alleviate the problem. Or you may want to disable write throttling if you
are sure the write just bursts in an extended time. Again, I'm not sure
whether the latter solution is feasible.

best regards,
hanzhu


On Sat, Feb 27, 2010 at 2:29 AM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Fri, 26 Feb 2010, Shane Cox wrote:
>
>>
>> I've reviewed the forum archives and read a number of threads related to
>> this issue.  However I
>> didn't find a root-cause explanation for these pauses, only talk of how to
>> ameliorate them.  In my
>> particular case, I would like to know why zfs_log_writes are blocked for
>> 180ms on a mutex (seemingly
>> blocked on the intent log itself) when performing zil_itx_assign.  Another
>> thread must have a lock on
>> the intent log, no?  Overall, the system appears healthy as other system
>> calls (e.g., reads and
>> writes to network devices) complete successfully while writes to the
>> intent log are blocked ... so
>> the problem seems to be access to the zfs intent log.
>> Any additional insight would be appreciated.
>>
>
> As far as I am aware, none of the zfs authors has been willing to address
> this issue in public.  It is not clear (to me) if the fundmental design of
> zfs transaction groups requires that writes stop briefly until the
> transaction group has been flushed to disk.  I suspect that this is the
> case.
>
> Perhaps zfs will never meet your timing requirements.  Others here have had
> considerable success by using RAID interface adaptor cards with
> battery-backed cache memory and configuring those cards to "IT" JBOD mode.
>  By limiting the TXG group size to the amount which will fit in
> battery-backed cache memory, the time to "commit" the TXG group is
> dramatically reduced as long as the continual write rate does not exceed
> what the backing disks can sustain.  Unfortunately, this may increase the
> total amount of data written to underlying storage.
>
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to