Thanks for sharing, Jeff! Comments below... On Mar 24, 2012, at 4:33 PM, Jeff Bacon wrote:
>> 2012-03-21 16:41, Paul Kraus wrote: >>> I have been running ZFS in a mission critical application since >>> zpool version 10 and have not seen any issues with some of the vdevs >>> in a zpool full while others are virtually empty. We have been running >>> commercial Solaris 10 releases. The configuration was that each >> >> Thanks for sharing some real-life data from larger deployments, >> as you often did. That's something I don't often have access >> to nowadays, with a liberty to tell :) > > Here's another datapoint, then: > > I'm using sol10u9 and u10 on a number of supermicro boxes, > mostly X8DTH boards with LSI 9211/9208 controllers and E5600 CPUs. > Application is NFS file service to a bunch of clients, and > we also have an in-house database application written in Java > which implements a column-oriented db in files. Just about all > of it is raidz2, much of it running gzip-compressed. > > Since I can't find anything saying not to other than some common > wisdom about not putting your eggs all in one basket that I'm > choosing to reject in some cases, I just keep adding vdevs to > the pool. started with 2TB barracudas for dev/test/archive > usage and constellations for prod, now 3TB drives, have just > added some of the new Pipeline drives with nothing particularly > of interest to note therefrom. > > You can create a startlingly large pool this way: > > ny-fs7(68)% zpool list > NAME SIZE ALLOC FREE CAP HEALTH ALTROOT > srv 177T 114T 63.3T 64% ONLINE - > > most pools are smaller. this is an archive box that's also > the guinea pig, 12 vdevs of 7 drives raidz2. the largest prod > one is 130TB in 11 vdevs of 8 drives raidz2. I won't guess > at the mix of 2TB and 3TB. these are both sol10u9. > > Another box has 150TB in 6 pools, raidz2/gzip using 2TB > constellations, dual X5690s with 144GB RAM running 20-30 > Java db workers. We do manage to break this box on the > odd occasion - there's a race condition in the ZIO code > where a buffer can be freed while the block buffer is in > the process of being "loaned" out to the compression code. > However, it takes 600 zpool threads plus another 600-900 > java threads running at the same time with a backlog of > 80000 ZIOs in queue, so it's not the sort of thing that > anyone's likely to run across much. :) It's fixed > in sol11, I understand; however, our intended fix is > to split the whole thing so that the workload (which > for various reasons needs to be on one box) is moved > to a 4-socket Westmere, and all of the data pools > are served via NFS from other boxes. > > I did lose some data once, long ago, using LSI 1068-based > controllers on older kit, but pretty much I can attribute > that to something between me-being-stupid and the 1068s > really not being especially friendly towards the LSI > expander chips in the older 3Gb/s SMC backplanes when used > for SATA-over-SAS tunneling. The current arrangements > are pretty solid otherwise. In general, mixing SATA and SAS directly behind expanders (eg without SAS/SATA intereposers) seems to be bad juju that an OS can't fix. > > The SATA-based boxes can be a little cranky when a drive > toasts, of course - they sit and hang for a while until they > finally decide to offline the drive. We take that as par > for the course; for the application in question (basically, > storing huge amounts of data on the odd occasion that someone > has a need for it), it's not exactly a showstopper. > > > I am curious as to whether there is any practical upper-limit > on the number of vdevs, or how far one might push this kind of > configuration in terms of pool size - assuming a sufficient > quantity of RAM, of course.... I'm sure I will need to > split this up someday but for the application there's just > something hideously convenient about leaving it all in one > filesystem in one pool. I've run pools with > 100 top-level vdevs. It is not uncommon to see 40+ top-level vdevs. -- richard -- DTrace Conference, April 3, 2012, http://wiki.smartos.org/display/DOC/dtrace.conf ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss