On Fri, 3 Jul 2009, Victor Latushkin wrote:

On 02.07.09 22:05, Bob Friesenhahn wrote:
On Thu, 2 Jul 2009, Zhu, Lejun wrote:

Actually it seems to be 3/4:

3/4 is an awful lot. That would be 15 GB on my system, which explains why the "5 seconds to write" rule is dominant.

3/4 is 1/8 * 6, where 6 is worst-case inflation factor (for raid-z2 is 9 actually, and considering ganged 1k block on raid-z2 in the really bad case it should be even bigger than that). DSL does inflate write sizes too, so inflated write sizes are compared against inflated limit, so it should be fine.

But blocking read I/O for several seconds is not so fine. There are various amounts of buffering and caches in the write pipe-line. These suggest that there is a certain amount of write data which is handled efficiently by the write pipe-line. Once buffers and caches fill, and the disks are maximally busy with write I/O, there is no more opportunity to do a read from the same disks for several seconds (up to five seconds). When a TXG is written, the system writes as just fast and hard as it can (for up to five seconds) without considering other requirements.

ZFS's asynchronous write caching is speculative, hoping that the application will update the data just written several times so that only the final version needs to be written and disk I/O and precious IOPS are saved. Unfortunately, not all applications work that way.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to