Re: [zfs-discuss] Basic ZFS Questions + Initial Setup Recommendation

Jeff Bacon Sat, 24 Mar 2012 16:35:16 -0700

> 2012-03-21 16:41, Paul Kraus wrote:
> >      I have been running ZFS in a mission critical application since
> > zpool version 10 and have not seen any issues with some of the vdevs
> > in a zpool full while others are virtually empty. We have been running
> > commercial Solaris 10 releases. The configuration was that each
> 
> Thanks for sharing some real-life data from larger deployments,
> as you often did. That's something I don't often have access
> to nowadays, with a liberty to tell :)


Here's another datapoint, then: 

I'm using sol10u9 and u10 on a number of supermicro boxes,
mostly X8DTH boards with LSI 9211/9208 controllers and E5600 CPUs.
Application is NFS file service to a bunch of clients, and 
we also have an in-house database application written in Java
which implements a column-oriented db in files. Just about all
of it is raidz2, much of it running gzip-compressed.

Since I can't find anything saying not to other than some common
wisdom about not putting your eggs all in one basket that I'm
choosing to reject in some cases, I just keep adding vdevs to
the pool. started with 2TB barracudas for dev/test/archive
usage and constellations for prod, now 3TB drives, have just
added some of the new Pipeline drives with nothing particularly
of interest to note therefrom. 

You can create a startlingly large pool this way:

ny-fs7(68)% zpool list
NAME   SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
srv    177T   114T  63.3T    64%  ONLINE  -

most pools are smaller. this is an archive box that's also
the guinea pig, 12 vdevs of 7 drives raidz2. the largest prod
one is 130TB in 11 vdevs of 8 drives raidz2. I won't guess
at the mix of 2TB and 3TB. these are both sol10u9. 

Another box has 150TB in 6 pools, raidz2/gzip using 2TB
constellations, dual X5690s with 144GB RAM running 20-30
Java db workers. We do manage to break this box on the
odd occasion - there's a race condition in the ZIO code 
where a buffer can be freed while the block buffer is in
the process of being "loaned" out to the compression code.
However, it takes 600 zpool threads plus another 600-900
java threads running at the same time with a backlog of 
80000 ZIOs in queue, so it's not the sort of thing that
anyone's likely to run across much. :) It's fixed
in sol11, I understand; however, our intended fix is
to split the whole thing so that the workload (which
for various reasons needs to be on one box) is moved
to a 4-socket Westmere, and all of the data pools
are served via NFS from other boxes. 

I did lose some data once, long ago, using LSI 1068-based 
controllers on older kit, but pretty much I can attribute
that to something between me-being-stupid and the 1068s
really not being especially friendly towards the LSI
expander chips in the older 3Gb/s SMC backplanes when used
for SATA-over-SAS tunneling. The current arrangements 
are pretty solid otherwise. 

The SATA-based boxes can be a little cranky when a drive
toasts, of course - they sit and hang for a while until they
finally decide to offline the drive. We take that as par
for the course; for the application in question (basically,
storing huge amounts of data on the odd occasion that someone
has a need for it), it's not exactly a showstopper.


I am curious as to whether there is any practical upper-limit
on the number of vdevs, or how far one might push this kind of
configuration in terms of pool size - assuming a sufficient
quantity of RAM, of course.... I'm sure I will need to 
split this up someday but for the application there's just
something hideously convenient about leaving it all in one
filesystem in one pool. 


-bacon

 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Basic ZFS Questions + Initial Setup Recommendation

Reply via email to