If one was sticking with OpenSolaris for the short term, is something older than 134 more stable/less buggy? Not using de-dupe.
-J On Thu, Sep 23, 2010 at 6:04 PM, Richard Elling <richard.ell...@gmail.com>wrote: > Hi Charles, > There are quite a few bugs in b134 that can lead to this. Alas, due to the > new > regime, there was a period of time where the distributions were not being > delivered. If I were in your shoes, I would upgrade to OpenIndiana b147 > which > has 26 weeks of maturity and bug fixes over b134. > > http://www.openindiana.org > -- richard > > > > On Sep 23, 2010, at 2:48 PM, Charles J. Knipe wrote: > > > So, I'm still having problems with intermittent hangs on write with my > ZFS pool. Details from my original post are below. Since posting that, > I've gone back and forth with a number of you, and gotten a lot of useful > advice, but I'm still trying to get to the root of the problem so I can > correct it. Since the original post I have: > > > > -Gathered a great deal of information in the form of kernel thread dumps, > zio_state dumps, and live crash dumps while the problem is happening. > > -Been advised that my ruling out of dedupe was probably premature, as I > still likely have a good deal of deduplicated data on-disk. > > -Checked just about every log and counter that might indicate a hardware > error, without finding one. > > > > I was wondering at this point if someone could give me some pointers on > the following: > > 1. Given the dumps and diagnostic data I've gathered so far, is there a > way I can determine for certain where in the ZFS driver I'm spending so much > time hanging? At the very least I'd like to try to determine whether it is, > in-fact a deduplication issue. > > 2. If it is, in fact, a deduplication issue, would my only recourse be a > new pool and a send/receive operation? The data we're storing is VMFS > volumes for ESX. We're tossing around the idea of creating new volumes in > the same pool (now that dedupe is off) and migrating VMs over in small > batches. The theory is that we would be writing non-deduped data this way, > and when we were done we could remove the deduplicated volumes. Is this > sound? > > > > Thanks again for all the help! > > > > -Charles > > > >> Howdy, > >> > >> We're having a ZFS performance issue over here that I > >> was hoping you guys could help me troubleshoot. We > >> have a ZFS pool made up of 24 disks, arranged into 7 > >> raid-z devices of 4 disks each. We're using it as an > >> iSCSI back-end for VMWare and some Oracle RAC > >> clusters. > >> > >> Under normal circumstances performance is very good > >> both in benchmarks and under real-world use. Every > >> couple days, however, I/O seems to hang for anywhere > >> between several seconds and several minutes. The > >> hang seems to be a complete stop of all write I/O. > >> The following zpool iostat illustrates: > >> > >> pool0 2.47T 5.13T 120 0 293K 0 > >> pool0 2.47T 5.13T 127 0 308K 0 > >> pool0 2.47T 5.13T 131 0 322K 0 > >> pool0 2.47T 5.13T 144 0 347K 0 > >> pool0 2.47T 5.13T 135 0 331K 0 > >> pool0 2.47T 5.13T 122 0 295K 0 > >> pool0 2.47T 5.13T 135 0 330K 0 > >> > >> While this is going on our VMs all hang, as do any > >> "zfs create" commands or attempts to touch/create > >> files in the zfs pool from the local system. After > >> several minutes the system "un-hangs" and we see very > >> high write rates before things return to normal > >> across the board. > >> > >> Some more information about our configuration: We're > >> running OpenSolaris svn-134. ZFS is at version 22. > >> Our disks are 15kRPM 300gb Seagate Cheetahs, mounted > >> in Promise J610S Dual enclosures, hanging off a Dell > >> SAS 5/e controller. We'd tried out most of this > >> configuration previously on OpenSolaris 2009.06 > >> without running into this problem. The only thing > >> that's new, aside from the newer OpenSolaris/ZFS is > >> a set of four SSDs configured as log disks. > >> > >> At first we blamed de-dupe, but we've disabled that. > >> Next we suspected the SSD log disks, but we've seen > >> the problem with those removed, as well. > >> > >> Has anyone seen anything like this before? Are there > >> any tools we can use to gather information during the > >> hang which might be useful in determining what's > >> going wrong? > >> > >> Thanks for any insights you may have. > >> > >> -Charles > > -- > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > OpenStorage Summit, October 25-27, Palo Alto, CA > http://nexenta-summit2010.eventbrite.com > ZFS and performance consulting > http://www.RichardElling.com > > > > > > > > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss