Richard Elling wrote:
Some history below...
Scott Lawson wrote:
Michael Shadle wrote:
On Mon, Apr 27, 2009 at 4:51 PM, Scott Lawson
<scott.law...@manukau.ac.nz> wrote:
If possible though you would be best to let the 3ware controller
expose
the 16 disks as a JBOD to ZFS and create a RAIDZ2 within Solaris
as you
will then
gain the full benefits of ZFS. Block self healing etc etc.
There isn't an issue in using a larger amount of disks in a RAIDZ2,
just
that it
is not the optimal size. Longer rebuild times for larger vdev's in
a zpool
(although this
is proportional to how full the pool is.). Two parity disks gives you
greater cover in
the event of a drive failing in a large vdev stripe.
Hmm, this is a bit disappointing to me. I would have dedicated only 2
disks out of 16 then to a single large raidz2 instead of two 8 disk
raidz2's (meaning 4 disks went to parity)
No I was referring to a single RAIDZ2 vdev of 16 disks in your pool.
So you would
lose ~2 disks to parity effectively. The larger the stripe,
potentially the slower the rebuild.
If you had multiple vdevs in a pool that were smaller stripes you
would get less performance
degradation by virtue of IO isolation. Of course here you lose pool
capacity. With
smaller vdevs, you could also potentially just use RAIDZ and not
RAIDZ2 and then you would
have the equivalent size pool still with two parity disks. 1 per vdev.
A few years ago, Sun introduced the X4500 (aka Thumper) which had 48
disks in the chassis. Of course, the first thing customers did was to
make
a single-level 46 or 48 disk raidz set. The second thing they did was
complain
that the resulting performance sucked. So the "solution" was to try
and put
some sort of practical limit into the docs to help people not hurt
themselves.
After much research (down at the pub? :-) the recommendation you see in
the man page was the concensus. It has absolutely nothing to do with
correctness of design or implementation. It has everything to do with
setting expectations of "goodness."
Sure, I understand this. I was a beta tester for the J4500 because I
prefer SPARC systems mostly
for Solaris. Certainly for these large disk systems the preferred layout
of around 5-6 drives per vdev
is what I use on my assortment of *4500 series devices. My production
J4500's with 48 x 1 TB drives
yield around ~31 TB usable. A T5520 10 Gig attached will pretty much
saturate the 3Gb/s SAS HBA connecting
it to the J4500. ;)
Being that this a home NAS for Michael serving large contiguous files
with fairly low random access
requirements, most likely I would imagine that these rules of thumb can
be relaxed a little. As you
state they are a rule of thumb for generic loads. This list does appear
to be attracting people
wanting to use ZFS for home and capacity tends to be the biggest
requirement over performance.
As I always advise people. Test with *your* workload as *your*
requirements may be different
to the next mans. If you favor capacity over performance then a larger
vdev of a dozen or so disks
will work 'OK' in my experience. (I do routinely get referred to Sun
customers in NZ as a site that
actually use ZFS in production and doesn't just play with it.)
I have tested the aforementioned thumpers with just this sort of config
myself with varying results
on varying workloads. Video servers, Sun Email etc etc... Long time ago now.
I also have hardware backed RAID 6's consisting of 16 drives in 6000
series storage on Crystal firmware
which work just fine in the hardware RAID world. (where I want capacity
over speed). This is real world
production class stuff. Works just fine. I have ZFS overlaid on top of
this as well.
But it is good that we are emphasizing the trade offs that any config
has. Everyone can learn from these
sorts of discussions. ;)
One thing you haven't mentioned is the drive type and size that you
are planning to use as this
greatly influences what people here would recommend. RAIDZ2 is built
for big, slow SATA
disks as reconstruction times in large RAIDZ's and RAIDZ2's increase
the risk of vdev failure
significantly due to the time taken to resilver to a replacement
drive. Hot spares are your friend!
The concern with large drives is unrecoverable reads during resilvering.
One contributor to this is superparamagnetic decay, where the bits are
lost over time as the medium tries to revert to a more steady state.
To some extent, periodic scrubs will help repair these while the disks
are otherwise still good. At least one study found that this can occur
even when scrubs are done, so there is an open research opportunity
to determine the risk and recommend scrubbing intervals. To a lesser
extent, hot spares can help reduce the hours it may take to physically
repair the failed drive.
+1
I was still operating under the impression that vdevs larger than 7-8
disks typically make baby Jesus nervous.
You did also state that this is a system to be used for backups? So
availability is five 9's?
I do not believe you can achieve five 9s with current consumer disk
drives for an extended period, say >1 year.
Sorry that was supposed to be "isn't". not sure what happened to the
"n't" ;)
However I have had some interesting conversations with friends at HDS on
reliability of slower
speed SATA drives over so called enterprise class 10K and 15K FC disks.
The results they were
seeing were surprising in that they were seeing much lower failure rates
on slower 7200 RPM SATA
and SAS disks from recollection.
Are you planning on using Open Solaris or mainstream Solaris 10?
Mainstream Solaris
10 is more conservative and is capable of being placed under a
support agreement if need
be.
Mainstream Solaris 10 gets a port of ZFS from OpenSolaris, so its
features are fewer and later. As time ticks away, fewer features
will be back-ported to Solaris 10. Meanwhile, you can get a production
support agreement for OpenSolaris.
Sure if you want to run it on x86. I believe sometime in 2009 we will
see a SPARC release
for Opensolaris. I understand that it is to be the next OpenSolaris
release, but I wouldn't hold
my breath.
http://www.sun.com/service/opensolaris/index.jsp
Yup seen it ages ago. Problem is as above.
-- richard
--
_______________________________________________________________________
Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand
Phone : +64 09 968 7611
Fax : +64 09 968 7641
Mobile : +64 27 568 7611
mailto:sc...@manukau.ac.nz
http://www.manukau.ac.nz
________________________________________________________________________
perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
________________________________________________________________________
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss