Re: [zfs-discuss] Re: raid-z random read performance

James Blackburn Thu, 02 Nov 2006 15:48:50 -0800

Checksum the individual RAID-5 blocks, rather than the entire stripe?

Depending on your the number of drives in your RAID-Z, this willincrease your metadata size by N-1 * 32 bytes. Would this not be anundesirable cost increase on the metadata size?

In more detail: Allow the pointer to the block to contain onechecksum per device used (the count will vary if you're using aRAID-Z style algorithm). Checksum each device's data independently,so the pointer looks like one of:
  (a) array of <device, offset, checksum> tuples
  (b) <device, offset> tuple and array of checksums

The former is really nice, as it also allows you to place the blockarbitrarily within a disk. This could potentially allow a moreefficient implementation of RAID-Z with variable sized disks (somedisks could effectively be concatenated as in a RAID-0 reducing theoverall striped width).

Writing a partial stripe is more interesting. With style (b) blockpointers -- RAID-Z style -- you need to do a read/modify/write ofthe stripe to get all the new data into the right place. But withstyle (a) block pointers, you’re back to RAID-5 style writes (readold data+parity or remaining data, write new data+parity). (Youdon't need to rewrite the whole stripe since the block pointer canrefer to the original partial stripe.)


Indeed!

Out of interest what are the major concerns with increasing the blockpointer size from 128 bytes?



James_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: raid-z random read performance

Reply via email to