Re: [zfs-discuss] Intermittent ZFS hang

Jason J. W. Williams Mon, 27 Sep 2010 11:12:49 -0700

If one was sticking with OpenSolaris for the short term, is something older
than 134 more stable/less buggy? Not using de-dupe.


-J

On Thu, Sep 23, 2010 at 6:04 PM, Richard Elling <richard.ell...@gmail.com>wrote:

> Hi Charles,
> There are quite a few bugs in b134 that can lead to this. Alas, due to the
> new
> regime, there was a period of time where the distributions were not being
> delivered. If I were in your shoes, I would upgrade to OpenIndiana b147
> which
> has 26 weeks of maturity and bug fixes over b134.
>
> http://www.openindiana.org
>  -- richard
>
>
>
> On Sep 23, 2010, at 2:48 PM, Charles J. Knipe wrote:
>
> > So, I'm still having problems with intermittent hangs on write with my
> ZFS pool.  Details from my original post are below.  Since posting that,
> I've gone back and forth with a number of you, and gotten a lot of useful
> advice, but I'm still trying to get to the root of the problem so I can
> correct it.  Since the original post I have:
> >
> > -Gathered a great deal of information in the form of kernel thread dumps,
> zio_state dumps, and live crash dumps while the problem is happening.
> > -Been advised that my ruling out of dedupe was probably premature, as I
> still likely have a good deal of deduplicated data on-disk.
> > -Checked just about every log and counter that might indicate a hardware
> error, without finding one.
> >
> > I was wondering at this point if someone could give me some pointers on
> the following:
> > 1. Given the dumps and diagnostic data I've gathered so far, is there a
> way I can determine for certain where in the ZFS driver I'm spending so much
> time hanging?  At the very least I'd like to try to determine whether it is,
> in-fact a deduplication issue.
> > 2. If it is, in fact, a deduplication issue, would my only recourse be a
> new pool and a send/receive operation?  The data we're storing is VMFS
> volumes for ESX.  We're tossing around the idea of creating new volumes in
> the same pool (now that dedupe is off) and migrating VMs over in small
> batches.  The theory is that we would be writing non-deduped data this way,
> and when we were done we could remove the deduplicated volumes.  Is this
> sound?
> >
> > Thanks again for all the help!
> >
> > -Charles
> >
> >> Howdy,
> >>
> >> We're having a ZFS performance issue over here that I
> >> was hoping you guys could help me troubleshoot.  We
> >> have a ZFS pool made up of 24 disks, arranged into 7
> >> raid-z devices of 4 disks each.  We're using it as an
> >> iSCSI back-end for VMWare and some Oracle RAC
> >> clusters.
> >>
> >> Under normal circumstances performance is very good
> >> both in benchmarks and under real-world use.  Every
> >> couple days, however, I/O seems to hang for anywhere
> >> between several seconds and several minutes.  The
> >> hang seems to be a complete stop of all write I/O.
> >> The following zpool iostat illustrates:
> >>
> >> pool0       2.47T  5.13T    120      0   293K      0
> >> pool0       2.47T  5.13T    127      0   308K      0
> >> pool0       2.47T  5.13T    131      0   322K      0
> >> pool0       2.47T  5.13T    144      0   347K      0
> >> pool0       2.47T  5.13T    135      0   331K      0
> >> pool0       2.47T  5.13T    122      0   295K      0
> >> pool0       2.47T  5.13T    135      0   330K      0
> >>
> >> While this is going on our VMs all hang, as do any
> >> "zfs create" commands or attempts to touch/create
> >> files in the zfs pool from the local system.  After
> >> several minutes the system "un-hangs" and we see very
> >> high write rates before things return to normal
> >> across the board.
> >>
> >> Some more information about our configuration:  We're
> >> running OpenSolaris svn-134.  ZFS is at version 22.
> >> Our disks are 15kRPM 300gb Seagate Cheetahs, mounted
> >> in Promise J610S Dual enclosures, hanging off a Dell
> >> SAS 5/e controller.  We'd tried out most of this
> >> configuration previously on OpenSolaris 2009.06
> >> without running into this problem.  The only thing
> >> that's new, aside from the newer OpenSolaris/ZFS is
> >> a set of four SSDs configured as log disks.
> >>
> >> At first we blamed de-dupe, but we've disabled that.
> >> Next we suspected the SSD log disks, but we've seen
> >> the problem with those removed, as well.
> >>
> >> Has anyone seen anything like this before?  Are there
> >> any tools we can use to gather information during the
> >> hang which might be useful in determining what's
> >> going wrong?
> >>
> >> Thanks for any insights you may have.
> >>
> >> -Charles
> > --
> > This message posted from opensolaris.org
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> OpenStorage Summit, October 25-27, Palo Alto, CA
> http://nexenta-summit2010.eventbrite.com
> ZFS and performance consulting
> http://www.RichardElling.com
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intermittent ZFS hang

Reply via email to