Erik: does that mean that keeping the number of data drives in a raidz(n) to a power of two is better? In the example you gave, you mentioned 14kb being written to each drive. That doesn't sound very efficient to me.
(when I say the above, I mean a five disk raidz or a ten disk raidz2, etc) Cheers, On 9 September 2010 18:58, Erik Trimble <erik.trim...@oracle.com> wrote: > > the thing that folks tend to forget is that RaidZ is IOPS limited. For the > most part, if I want to reconstruct a single slab (stripe) of data, I have > to issue a read to EACH disk in the vdev, and wait for that disk to return > the value, before I can write the computed parity value out to the disk > under reconstruction. > > This is *regardless* of the amount of data being reconstructed. > > So, the bottleneck tends to be the IOPS value of the single disk being > reconstructed. Thus, having fewer disks in a vdev leads to less data being > required to be resilvered, which leads to fewer IOPS being required to > finish the resilver. > > > Example (for ease of calculation, let's do the disk-drive mfg's cheat of 1k > = 1000 bytes): > > Scenario 1: I have 5 1TB disks in a raidz1, and I assume I have 128k > slab sizes. Thus, I have 32k of data for each slab written to each disk. > (4x32k data + 32k parity for a 128k slab size). So, each IOPS gets to > reconstruct 32k of data on the failed drive. It thus takes about 1TB/32k = > 31e6 IOPS to reconstruct the full 1TB drive. > > Scenario 2: I have 10 1TB drives in a raidz1, with the same 128k slab > sizes. In this case, there's only about 14k of data on each drive for a > slab. This means, each IOPS to the failed drive only write 14k. So, it > takes 1TB/14k = 71e6 IOPS to complete. > > > From this, it can be pretty easy to see that the number of required IOPS to > the resilvered disk goes up linearly with the number of data drives in a > vdev. Since you're always going to be IOPS bound by the single disk > resilvering, you have a fixed limit. > > In addition, remember that having more disks means you have to wait longer > for each IOPS to complete. That is, it takes longer (fractionally, but in > the aggregate, a measuable amount) for 9 drives to each return 14k of info > than it does for 4 drives to return 32k of data. This is due to rotational > and seek access delays. So, not only are you having to do more total IOPS > in Scenario 2, but each IOPS takes longer to complete (the read cycle taking > longer, the write/reconstruct cycle taking the same amount of time). > > > > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > Timezone: US/Pacific (GMT-0800)
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss