Checksum the individual RAID-5 blocks, rather than the entire stripe?

Depending on your the number of drives in your RAID-Z, this will increase your metadata size by N-1 * 32 bytes. Would this not be an undesirable cost increase on the metadata size?

In more detail: Allow the pointer to the block to contain one checksum per device used (the count will vary if you're using a RAID-Z style algorithm). Checksum each device's data independently, so the pointer looks like one of:

  (a) array of <device, offset, checksum> tuples
  (b) <device, offset> tuple and array of checksums

The former is really nice, as it also allows you to place the block arbitrarily within a disk. This could potentially allow a more efficient implementation of RAID-Z with variable sized disks (some disks could effectively be concatenated as in a RAID-0 reducing the overall striped width).

Writing a partial stripe is more interesting. With style (b) block pointers -- RAID-Z style -- you need to do a read/modify/write of the stripe to get all the new data into the right place. But with style (a) block pointers, you’re back to RAID-5 style writes (read old data+parity or remaining data, write new data+parity). (You don't need to rewrite the whole stripe since the block pointer can refer to the original partial stripe.)

Indeed!

Out of interest what are the major concerns with increasing the block pointer size from 128 bytes?


James_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to