On Oct 17, 2006, at 12:43 PM, Matthew Ahrens wrote:

Jeremy Teo wrote:
Heya Anton,
On 10/17/06, Anton B. Rang <[EMAIL PROTECTED]> wrote:
No, the reason to try to match recordsize to the write size is so that a small write does not turn into a large read + a large write. In configurations where the disk is kept busy, multiplying 8K of data transfer up to 256K hurts.

(Actually ZFS goes up to 128k not 256k (yet!))

256K = 128K read + 128K write.

Yes, although actually most non-COW filesystems have this same problem, because they don't write partial blocks either, even though technically they could. (And FYI, checksumming would "take away" the ability to write partial blocks too.)

In direct I/O mode, though, which is commonly used for databases, writes only affect individual disk blocks, not the whole file system blocks. (At least for UFS & QFS, but I presume VxFS is similar.)

In the case of QFS in paged mode, only dirty pages are written, not whole file system blocks ("disk allocation units", or "DAUs", in QFS terminology). It's common to use 2 MB or larger DAUs to reduce allocation overhead, improve contiguity, and reduce the need for indirect blocks. I'm not sure if this is the case for UFS with 8K blocks and 4K pages, but I imagine it is.

As you say, checksumming requires that either whole "checksum blocks" (not necessarily file system blocks!) be processed, or that the checksum function is reversible (in the sense that inverse and composition functions for it exist) [ checksum(ABC) = f(g(A),g(B),g (C)) and there exists g^-1(B) such that we can compute checksum(AB'C) = f(g(A),g(B'),g(C)) or checksum(AB'C) = h(checksum(ABC), range(A), range(B), range(C), g^-1(B), g(B')) ]. [The latter approach comes from a paper I can't track down right now; if anyone's familiar with it, I'd love to get the reference again.]

-- Anton

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to