Re: [zfs-discuss] Re: Self-tuning recordsize

Anton Rang Tue, 17 Oct 2006 12:00:34 -0700

On Oct 17, 2006, at 12:43 PM, Matthew Ahrens wrote:

Jeremy Teo wrote:
Heya Anton,
On 10/17/06, Anton B. Rang <[EMAIL PROTECTED]> wrote:
No, the reason to try to match recordsize to the write size is sothat a small write does not turn into a large read + a largewrite. In configurations where the disk is kept busy,multiplying 8K of data transfer up to 256K hurts.
(Actually ZFS goes up to 128k not 256k (yet!))


256K = 128K read + 128K write.

Yes, although actually most non-COW filesystems have this sameproblem, because they don't write partial blocks either, eventhough technically they could. (And FYI, checksumming would "takeaway" the ability to write partial blocks too.)

In direct I/O mode, though, which is commonly used for databases,writes only affect individual disk blocks, not the whole file systemblocks. (At least for UFS & QFS, but I presume VxFS is similar.)

In the case of QFS in paged mode, only dirty pages are written, notwhole file system blocks ("disk allocation units", or "DAUs", in QFSterminology). It's common to use 2 MB or larger DAUs to reduceallocation overhead, improve contiguity, and reduce the need forindirect blocks. I'm not sure if this is the case for UFS with 8Kblocks and 4K pages, but I imagine it is.

As you say, checksumming requires that either whole "checksumblocks" (not necessarily file system blocks!) be processed, or thatthe checksum function is reversible (in the sense that inverse andcomposition functions for it exist) [ checksum(ABC) = f(g(A),g(B),g(C)) and there exists g^-1(B) such that we can compute checksum(AB'C)= f(g(A),g(B'),g(C)) or checksum(AB'C) = h(checksum(ABC), range(A),range(B), range(C), g^-1(B), g(B')) ]. [The latter approach comesfrom a paper I can't track down right now; if anyone's familiar withit, I'd love to get the reference again.]


-- Anton

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Self-tuning recordsize

Reply via email to