> 2012-03-21 16:41, Paul Kraus wrote: > > I have been running ZFS in a mission critical application since > > zpool version 10 and have not seen any issues with some of the vdevs > > in a zpool full while others are virtually empty. We have been running > > commercial Solaris 10 releases. The configuration was that each > > Thanks for sharing some real-life data from larger deployments, > as you often did. That's something I don't often have access > to nowadays, with a liberty to tell :)
Here's another datapoint, then: I'm using sol10u9 and u10 on a number of supermicro boxes, mostly X8DTH boards with LSI 9211/9208 controllers and E5600 CPUs. Application is NFS file service to a bunch of clients, and we also have an in-house database application written in Java which implements a column-oriented db in files. Just about all of it is raidz2, much of it running gzip-compressed. Since I can't find anything saying not to other than some common wisdom about not putting your eggs all in one basket that I'm choosing to reject in some cases, I just keep adding vdevs to the pool. started with 2TB barracudas for dev/test/archive usage and constellations for prod, now 3TB drives, have just added some of the new Pipeline drives with nothing particularly of interest to note therefrom. You can create a startlingly large pool this way: ny-fs7(68)% zpool list NAME SIZE ALLOC FREE CAP HEALTH ALTROOT srv 177T 114T 63.3T 64% ONLINE - most pools are smaller. this is an archive box that's also the guinea pig, 12 vdevs of 7 drives raidz2. the largest prod one is 130TB in 11 vdevs of 8 drives raidz2. I won't guess at the mix of 2TB and 3TB. these are both sol10u9. Another box has 150TB in 6 pools, raidz2/gzip using 2TB constellations, dual X5690s with 144GB RAM running 20-30 Java db workers. We do manage to break this box on the odd occasion - there's a race condition in the ZIO code where a buffer can be freed while the block buffer is in the process of being "loaned" out to the compression code. However, it takes 600 zpool threads plus another 600-900 java threads running at the same time with a backlog of 80000 ZIOs in queue, so it's not the sort of thing that anyone's likely to run across much. :) It's fixed in sol11, I understand; however, our intended fix is to split the whole thing so that the workload (which for various reasons needs to be on one box) is moved to a 4-socket Westmere, and all of the data pools are served via NFS from other boxes. I did lose some data once, long ago, using LSI 1068-based controllers on older kit, but pretty much I can attribute that to something between me-being-stupid and the 1068s really not being especially friendly towards the LSI expander chips in the older 3Gb/s SMC backplanes when used for SATA-over-SAS tunneling. The current arrangements are pretty solid otherwise. The SATA-based boxes can be a little cranky when a drive toasts, of course - they sit and hang for a while until they finally decide to offline the drive. We take that as par for the course; for the application in question (basically, storing huge amounts of data on the odd occasion that someone has a need for it), it's not exactly a showstopper. I am curious as to whether there is any practical upper-limit on the number of vdevs, or how far one might push this kind of configuration in terms of pool size - assuming a sufficient quantity of RAM, of course.... I'm sure I will need to split this up someday but for the application there's just something hideously convenient about leaving it all in one filesystem in one pool. -bacon _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss