Re: [zfs-discuss] Trying to understand zfs RAID-Z

Henk Langeveld Thu, 17 May 2007 04:00:07 -0700

I'll make an attempt to keep it simple, and tell what is true in 'most'
cases.  For some values of 'most' ;-)


The words used are at  times confusing.  "Block" mostly refers to
a logical filesystem block, which can be variable in size.
There's also "checksum" and "parity", which are completely
independent.

    * The green and blue "blocks" shown in the diagram on page 11, do
      the represent actual physical blocks on individual disks or a
      single RAID-Z stripe write across multiple disks???


See Page 17: These are logical blocks, and can be variable in size.

    * The parity for RAID-Z, where is it?? Surely not the checksum
      stored together in the upper level direct and indirect block
      pointer? And if not and it is written as a separate block on
      another disks, then .......I am afraid I do not understand....


z-raid Parity vs zfs checksum

The parity is just a chunk of xor-ed data written for redundancy, and
is part of the same I/O transaction.

The checksum is a much smaller digest of the data used for detecting
the various modes of data corruption.  This is what goes into the
metadata (logical) blocks above.  A zfs file system always has checksums
and can function without parity.

    * Could someone please elaborate more on the statement "Every block
      is it's own RAID-Z stripe"??? The block being referred to is a
      single block across multiple disks or a single disk?

If the storage pool will use an n-way raid-z configuration, the(logical) block is first split into n-1 chunks, and an nth chunk

is added before any actual I/O takes place.   Each chunk goes to
a separate disk.

This goes hand in hand with the answer to question 2.  Because it's
Copy-on-Write, we only worry about new data when computing parity.

*My sincere apologies if the above questions seem trivial* . But I amreally struggling to reconcile the statement and the diagram.


Example

Logical block: (1 6k block of fs data)  Could be any size <= 128k
        |0|1|_|_|_|5|_|_|_|_|0|_| (12 x 512b blocks)  --> ::checksum::


This is split into a single 4x 2k stripe:

3 chunks of 2k:
        |00|01|02|03|   --> disk1    (4 sectors)
        |04|05|06|07|   --> disk2    (4 sectors)
        |08|09|10|11|   --> disk3    (4 sectors)

1 chunk of parity:
        |12|13|14|15|   --> disk4    (4 sectors)


::checksum:: is then recorded in the metadata, which gets written
in a separate stripe.  This is recursed for the metadata checksum,
until we reach the ueberblock, for which I won't explain the
redundancy and replication here.


Cheers,
Henk




_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Trying to understand zfs RAID-Z

Reply via email to