Re: [zfs-discuss] Suggested RaidZ configuration...

Hatish Narotam Mon, 13 Sep 2010 13:40:44 -0700

Mattias, what you say makes a lot of sense. When I saw *Both of the above
situations resilver in equal time*, I was like "no way!" But like you said,
assuming no bus bottlenecks.

This is my exact breakdown (cheap disks on cheap bus :P) :

PCI-E 8X 4-port ESata Raid Controller.
4 x ESata to 5Sata Port multipliers (each connected to a ESata port on the
controller).
20 x Samsung 1TB HDD's. (each connected to a Port Multiplier).

The PCIE 8x port gives me 4GBps, which is 32Gbps. No problem there. Each
ESata port guarantees 3Gbps, therefore 12Gbps limit on the controller. Each
PM can give up to 3Gbps, which is shared amongst 5 drives. According to
Samsungs site, max read speed is 250MBps, which translates to 2Gbps.
Multiply by 5 drives gives you 10Gbps. Which is 333% of the PM's capability.
So the drives arent likely to hit max read speed for long lengths of time,
especially during rebuild time.

So the bus is going to be quite a bottleneck. Lets assume that the drives
are 80% full. Thats 800GB that needs to be read on each drive, which is
(800x9) 7.2TB.
Best case scenario, we can read 7.2TB at 3Gbps
= 57.6 Tb at 3Gbps
= 57600 Gb at 3Gbps
= 19200 seconds
= 320 minutes
= 5 Hours 20 minutes.

Even if it takes twice that amount of time, Im happy.

Initially I had been thinking 2 PM's for each vdev. But now Im thinking
maybe split it wide as best I can ([2disks per PM] x 2, [3disks per PM] x 2)
for each vdev. It'll give the best possible speed, but still wont max out
the HDD's.

I've never actually sat and done the math before. Hope its decently accurate
:)

On Wed, Sep 8, 2010 at 3:27 PM, Edward Ned Harvey <sh...@nedharvey.com>wrote:

> > From: pantz...@gmail.com [mailto:pantz...@gmail.com] On Behalf Of
> > Mattias Pantzare
> >
> > It
> > is about 1 vdev with 12 disk or  2 vdev with 6 disks. If you have 2
> > vdev you have to read half the data compared to 1 vdev to resilver a
> > disk.
>
> Let's suppose you have 1T of data.  You have 12-disk raidz2.  So you have
> approx 100G on each disk, and you replace one disk.  Then 11 disks will
> each
> read 100G, and the new disk will write 100G.
>
> Let's suppose you have 1T of data.  You have 2 vdev's that are each 6-disk
> raidz1.  Then we'll estimate 500G is on each vdev, so each disk has approx
> 100G.  You replace a disk.  Then 5 disks will each read 100G, and 1 disk
> will write 100G.
>
> Both of the above situations resilver in equal time, unless there is a bus
> bottleneck.  21 disks in a single raidz3 will resilver just as fast as 7
> disks in a raidz1, as long as you are avoiding the bus bottleneck.  But 21
> disks in a single raidz3 provides better redundancy than 3 vdev's each
> containing a 7 disk raidz1.
>
> In my personal experience, approx 5 disks can max out approx 1 bus.  (It
> actually ranges from 2 to 7 disks, if you have an imbalance of cheap disks
> on a good bus, or good disks on a crap bus, but generally speaking people
> don't do that.  Generally people get a good bus for good disks, and cheap
> disks for crap bus, so approx 5 disks max out approx 1 bus.)
>
> In my personal experience, servers are generally built with a separate bus
> for approx every 5-7 disk slots.  So what it really comes down to is ...
>
> Instead of the Best Practices Guide saying "Don't put more than ___ disks
> into a single vdev," the BPG should say "Avoid the bus bandwidth bottleneck
> by constructing your vdev's using physical disks which are distributed
> across multiple buses, as necessary per the speed of your disks and buses."
>
>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Suggested RaidZ configuration...

Reply via email to