On Thu, Jun 02, 2011 at 11:39:13AM -0700, Donald Stahl wrote:
> > Yup; reset storms affected us as well (we were using the X-25 series
> > for ZIL/L2ARC).  Only the ZIL drives were impacted, but it was a large
> > impact :)
> What did you see with your reset storm? Were there log errors in
> /var/adm/messages or did you need to check the controller loogs with
> something like lsi util?

Yep, /var/adm/messages had Unit Attention errors.  Ref:

    http://markmail.org/message/5rmfzvqwlmosh2oh

> Did the reset workaround in the blog post help?

We re-architected before reading the blog post, so I'm unsure if it
would have helped or not.  In any case, moving the SSD's internal lets
us use additional hot-swappable data disks, so it was beneficial in
other areas as well.

> 
> The expanders you were using were SAS/SATA expanders? Or SAS expanders
> with adapters on the drive to allow the use of SATA disks?

The expander was a SuperMicro SAS-846EL1 which is a SAS expander but has
SFF-8482 connectors to provide compatability with SATA drives.

> 
> I've been using 4 X-25E's with Promise J610sD SAS shelves and the
> AAMUX adapters and have yet to have a problem.

It definitely seemed itermittent, and various suggestions we received
indicated we might need to downgrade our backplane/expander's firmware.
Never did try that, but it wouldn't surprise me if behavior was
better/worse on different backplanes...

> 
> > Our solution was to move the SSD's off of the expander and remount
> > internally attached via one of the LSI SAS ports directly (we also had
> > problems with running the drives directly off the on-board SATA ports
> > on our SuperMicro motherboards -- occasionally the entire zpool would
> > freeze up).
>
> I'm surprised you had problems with the internal SATA ports as well-
> any idea what was causing the problems there?

Nope.  I posted this:

    http://mail.opensolaris.org/pipermail/zfs-discuss/2010-October/045625.html

But got no responses.  We resolved the NFS errors (which I believe were
coincidental), but the watchdog port issues kept reoccurring without
rhyme or reason.  The box itself wouldn't lock up, but the zpool would
become non-resopnsive and we'd have to hard reset.

This was all production stuff, so as soon as we were able to, we
ditched using the SATA ports entirely instead of pursuing a fix with
Sun.

Ray
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to