On Oct 17, 2006, at 12:43 PM, Matthew Ahrens wrote:
Jeremy Teo wrote:
Heya Anton,
On 10/17/06, Anton B. Rang <[EMAIL PROTECTED]> wrote:
No, the reason to try to match recordsize to the write size is so
that a small write does not turn into a large read + a large
write. In configurations where the disk is kept busy,
multiplying 8K of data transfer up to 256K hurts.
(Actually ZFS goes up to 128k not 256k (yet!))
256K = 128K read + 128K write.
Yes, although actually most non-COW filesystems have this same
problem, because they don't write partial blocks either, even
though technically they could. (And FYI, checksumming would "take
away" the ability to write partial blocks too.)
In direct I/O mode, though, which is commonly used for databases,
writes only affect individual disk blocks, not the whole file system
blocks. (At least for UFS & QFS, but I presume VxFS is similar.)
In the case of QFS in paged mode, only dirty pages are written, not
whole file system blocks ("disk allocation units", or "DAUs", in QFS
terminology). It's common to use 2 MB or larger DAUs to reduce
allocation overhead, improve contiguity, and reduce the need for
indirect blocks. I'm not sure if this is the case for UFS with 8K
blocks and 4K pages, but I imagine it is.
As you say, checksumming requires that either whole "checksum
blocks" (not necessarily file system blocks!) be processed, or that
the checksum function is reversible (in the sense that inverse and
composition functions for it exist) [ checksum(ABC) = f(g(A),g(B),g
(C)) and there exists g^-1(B) such that we can compute checksum(AB'C)
= f(g(A),g(B'),g(C)) or checksum(AB'C) = h(checksum(ABC), range(A),
range(B), range(C), g^-1(B), g(B')) ]. [The latter approach comes
from a paper I can't track down right now; if anyone's familiar with
it, I'd love to get the reference again.]
-- Anton
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss