Re: [zfs-discuss] Intermittent ZFS hang

Richard Elling Thu, 23 Sep 2010 17:05:39 -0700

Hi Charles,
There are quite a few bugs in b134 that can lead to this. Alas, due to the new
regime, there was a period of time where the distributions were not being
delivered. If I were in your shoes, I would upgrade to OpenIndiana b147 which
has 26 weeks of maturity and bug fixes over b134.


http://www.openindiana.org
 -- richard



On Sep 23, 2010, at 2:48 PM, Charles J. Knipe wrote:

> So, I'm still having problems with intermittent hangs on write with my ZFS 
> pool.  Details from my original post are below.  Since posting that, I've 
> gone back and forth with a number of you, and gotten a lot of useful advice, 
> but I'm still trying to get to the root of the problem so I can correct it.  
> Since the original post I have:
> 
> -Gathered a great deal of information in the form of kernel thread dumps, 
> zio_state dumps, and live crash dumps while the problem is happening.
> -Been advised that my ruling out of dedupe was probably premature, as I still 
> likely have a good deal of deduplicated data on-disk.
> -Checked just about every log and counter that might indicate a hardware 
> error, without finding one.
> 
> I was wondering at this point if someone could give me some pointers on the 
> following:
> 1. Given the dumps and diagnostic data I've gathered so far, is there a way I 
> can determine for certain where in the ZFS driver I'm spending so much time 
> hanging?  At the very least I'd like to try to determine whether it is, 
> in-fact a deduplication issue.
> 2. If it is, in fact, a deduplication issue, would my only recourse be a new 
> pool and a send/receive operation?  The data we're storing is VMFS volumes 
> for ESX.  We're tossing around the idea of creating new volumes in the same 
> pool (now that dedupe is off) and migrating VMs over in small batches.  The 
> theory is that we would be writing non-deduped data this way, and when we 
> were done we could remove the deduplicated volumes.  Is this sound?
> 
> Thanks again for all the help!
> 
> -Charles
> 
>> Howdy,
>> 
>> We're having a ZFS performance issue over here that I
>> was hoping you guys could help me troubleshoot.  We
>> have a ZFS pool made up of 24 disks, arranged into 7
>> raid-z devices of 4 disks each.  We're using it as an
>> iSCSI back-end for VMWare and some Oracle RAC
>> clusters.
>> 
>> Under normal circumstances performance is very good
>> both in benchmarks and under real-world use.  Every
>> couple days, however, I/O seems to hang for anywhere
>> between several seconds and several minutes.  The
>> hang seems to be a complete stop of all write I/O.
>> The following zpool iostat illustrates:
>> 
>> pool0       2.47T  5.13T    120      0   293K      0
>> pool0       2.47T  5.13T    127      0   308K      0
>> pool0       2.47T  5.13T    131      0   322K      0
>> pool0       2.47T  5.13T    144      0   347K      0
>> pool0       2.47T  5.13T    135      0   331K      0
>> pool0       2.47T  5.13T    122      0   295K      0
>> pool0       2.47T  5.13T    135      0   330K      0
>> 
>> While this is going on our VMs all hang, as do any
>> "zfs create" commands or attempts to touch/create
>> files in the zfs pool from the local system.  After
>> several minutes the system "un-hangs" and we see very
>> high write rates before things return to normal
>> across the board.
>> 
>> Some more information about our configuration:  We're
>> running OpenSolaris svn-134.  ZFS is at version 22.
>> Our disks are 15kRPM 300gb Seagate Cheetahs, mounted
>> in Promise J610S Dual enclosures, hanging off a Dell
>> SAS 5/e controller.  We'd tried out most of this
>> configuration previously on OpenSolaris 2009.06
>> without running into this problem.  The only thing
>> that's new, aside from the newer OpenSolaris/ZFS is
>> a set of four SSDs configured as log disks.
>> 
>> At first we blamed de-dupe, but we've disabled that.
>> Next we suspected the SSD log disks, but we've seen
>> the problem with those removed, as well.
>> 
>> Has anyone seen anything like this before?  Are there
>> any tools we can use to gather information during the
>> hang which might be useful in determining what's
>> going wrong?
>> 
>> Thanks for any insights you may have.
>> 
>> -Charles
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intermittent ZFS hang

Reply via email to