I'll make an attempt to keep it simple, and tell what is true in 'most' cases. For some values of 'most' ;-)
The words used are at times confusing. "Block" mostly refers to a logical filesystem block, which can be variable in size. There's also "checksum" and "parity", which are completely independent.
* The green and blue "blocks" shown in the diagram on page 11, do the represent actual physical blocks on individual disks or a single RAID-Z stripe write across multiple disks???
See Page 17: These are logical blocks, and can be variable in size.
* The parity for RAID-Z, where is it?? Surely not the checksum stored together in the upper level direct and indirect block pointer? And if not and it is written as a separate block on another disks, then .......I am afraid I do not understand....
z-raid Parity vs zfs checksum The parity is just a chunk of xor-ed data written for redundancy, and is part of the same I/O transaction. The checksum is a much smaller digest of the data used for detecting the various modes of data corruption. This is what goes into the metadata (logical) blocks above. A zfs file system always has checksums and can function without parity.
* Could someone please elaborate more on the statement "Every block is it's own RAID-Z stripe"??? The block being referred to is a single block across multiple disks or a single disk?
If the storage pool will use an n-way raid-z configuration, the (logical) block is first split into n-1 chunks, and an nth chunk
is added before any actual I/O takes place. Each chunk goes to a separate disk. This goes hand in hand with the answer to question 2. Because it's Copy-on-Write, we only worry about new data when computing parity.
*My sincere apologies if the above questions seem trivial* . But I am really struggling to reconcile the statement and the diagram.
Example Logical block: (1 6k block of fs data) Could be any size <= 128k |0|1|_|_|_|5|_|_|_|_|0|_| (12 x 512b blocks) --> ::checksum:: This is split into a single 4x 2k stripe: 3 chunks of 2k: |00|01|02|03| --> disk1 (4 sectors) |04|05|06|07| --> disk2 (4 sectors) |08|09|10|11| --> disk3 (4 sectors) 1 chunk of parity: |12|13|14|15| --> disk4 (4 sectors) ::checksum:: is then recorded in the metadata, which gets written in a separate stripe. This is recursed for the metadata checksum, until we reach the ueberblock, for which I won't explain the redundancy and replication here. Cheers, Henk _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss