On Thu, Sep 13, 2007 at 04:58:10AM +0000, Marc Bevand wrote: > Pawel Jakub Dawidek <pjd <at> FreeBSD.org> writes: > > > > This is how RAIDZ fills the disks (follow the numbers): > > > > Disk0 Disk1 Disk2 Disk3 > > > > D0 D1 D2 P3 > > D4 D5 D6 P7 > > D8 D9 D10 P11 > > D12 D13 D14 P15 > > D16 D17 D18 P19 > > D20 D21 D22 P23 > > > > D is data, P is parity. > > This layout assumes of course that large stripes have been written to > the RAIDZ vdev. As you know, the stripe width is dynamic, so it is > possible for a single logical block to span only 2 disks (for those who > don't know what I am talking about, see the "red" block occupying LBAs > D3 and E3 on page 13 of these ZFS slides [1]).
Yes I'm aware of that. > To read this logical block (and validate its checksum), only D_0 needs > to be read (LBA E3). So in this very specific case, a RAIDZ read > operation is as cheap as a RAID5 read operation. [...] If you do single sector writes - yes, but this is very inefficient, because of two reasons: 1. Bandwidth - writting one sector at a time? Come on. 2. Space - when you write one sector and its parity you consume two sectors. You may have more than one parity column in that case, eg. Disk0 Disk1 Disk2 Disk3 Disk4 Disk5 D0 P0 D1 P1 D2 P2 In this case space overhead is the same as in mirror. > [...] The existence of these > small stripes could explain why RAIDZ doesn't perform as bad as RAID5 > in Pawel's benchmark... No, as I said, the smallest block I used was 2kB, which means four 512b blocks plus one 512b of parity - each 2kB block uses all 5 disks. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am!
pgpvqYkQFVjyQ.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss