Erik: does that mean that keeping the number of data drives in a raidz(n) to
a power of two is better? In the example you gave, you mentioned 14kb being
written to each drive. That doesn't sound very efficient to me.

(when I say the above, I mean a five disk raidz or a ten disk raidz2, etc)

Cheers,

On 9 September 2010 18:58, Erik Trimble <erik.trim...@oracle.com> wrote:
>
> the thing that folks tend to forget is that RaidZ is IOPS limited.  For the
> most part, if I want to reconstruct a single slab (stripe) of data, I have
> to issue a read to EACH disk in the vdev, and wait for that disk to return
> the value, before I can write the computed parity value out to the disk
> under reconstruction.
>
> This is *regardless* of the amount of data being reconstructed.
>
> So, the bottleneck tends to be the IOPS value of the single disk being
> reconstructed.  Thus, having fewer disks in a vdev leads to less data being
> required to be resilvered, which leads to fewer IOPS being required to
> finish the resilver.
>
>
> Example (for ease of calculation, let's do the disk-drive mfg's cheat of 1k
> = 1000 bytes):
>
> Scenario 1:    I have 5 1TB disks in a raidz1, and I assume I have 128k
> slab sizes.  Thus, I have 32k of data for each slab written to each disk.
> (4x32k data + 32k parity for a 128k slab size).  So, each IOPS gets to
> reconstruct 32k of data on the failed drive.   It thus takes about 1TB/32k =
> 31e6 IOPS to reconstruct the full 1TB drive.
>
> Scenario 2:    I have 10 1TB drives in a raidz1, with the same 128k slab
> sizes.  In this case, there's only about 14k of data on each drive for a
> slab. This means, each IOPS to the failed drive only write 14k.  So, it
> takes 1TB/14k = 71e6 IOPS to complete.
>
>
> From this, it can be pretty easy to see that the number of required IOPS to
> the resilvered disk goes up linearly with the number of data drives in a
> vdev.  Since you're always going to be IOPS bound by the single disk
> resilvering, you have a fixed limit.
>
> In addition, remember that having more disks means you have to wait longer
> for each IOPS to complete.  That is, it takes longer (fractionally, but in
> the aggregate, a measuable amount) for 9 drives to each return 14k of info
> than it does for 4 drives to return 32k of data.  This is due to rotational
> and seek access delays.  So, not only are you having to do more total IOPS
> in Scenario 2, but each IOPS takes longer to complete (the read cycle taking
> longer, the write/reconstruct cycle taking the same amount of time).
>
>
>
> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to