On 13/04/2012 9:42 a.m., Rich wrote:
Those patches aren't yet in OI/IL mainline, as of when I looked today.
Regarding when they'll be usable, either in mainline or by fetching
them yourself...
17:33< PMT> ping Triskelios - I don't suppose you have your pending
patches to mpt_sas (per
http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/)
laying around somewhere easily grabbable?
17:34<@Triskelios> not at the moment, should land on our public repo
on bitbucket sometime soon
- Rich
Just a word of caution to other hackers like me.
I was experimenting with shorter drive time-outs and destroyed the zpool
while testing different time-outs on some sata disks (on 8 port Marvel
controller).
The drives in the pool (20) all dropped like flies when the io usage
increased, and I had to redo it from scratch.
In my experience, drive issues are often a bad block, seen by a jump in
iostat errors, and that can occasionally impact performance well before
the disk actually fails.
I now check iostat regularly, and spare out a disk with issues before
the failure becomes permanent, although it does not always avoid a big
impact as some drives can fail during the re-silver process.
Running a full read/write using format is usually enough to fix the bad
block permanently.
If not, then it's RMA'd.
Mark.
On Thu, Apr 12, 2012 at 1:30 PM, Karl Rossing
<karl.ross...@barobinson.com> wrote:
I'm running into this issue with disconnected drives on snv_134.
Would upgrading to oi_151a2 have the updated mpt_sas drive as noted on
http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
"Update (New): These timeouts don’t do squat because mpt_sas doesn’t honour
the timeouts. This was recently uncovered by Nexenta and a patch to fix it
is about to hit Illumos shortly. I’ll post when it does. Another patch is in
progress which will further improve how mpt_sas handles failed drives.
Thanks to Albert Lee for his work on them - you, sir, rock!"
Karl
On 01/10/2012 10:48 AM, Martin Frost wrote:
> From: Jason Matthews<ja...@broken.net>
> Date: Tue, 10 Jan 2012 08:26:08 -0800
>
>
> you can adjust the disk timeouts in solaris.
Here's an article on how to do that, although it ends with the author
adding this comment "However in testing with failing harddrives (on
mpt_sas anyway), we see that the sd timeouts are completely ignored so
my entire post above is moot!"
http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
I haven't tested this, so does it work or not (in OpenIndiana)?
Martin
> there are two schools of thought here:
>
> 1) accomodate the extremely long timeouts of cinsumer drives and
> let the drive decide whether to report an error back (fail itself
> out)
>
> 2) set the time outs very narrowly and be aggressive in letting zfs
> fail out disks.
>
> i generally go with option 2.
>
> Sent from Jasons' hand held
>
> On Jan 10, 2012, at 7:13 AM, Maurilio Longo<maurilio.lo...@libero.it>
wrote:
>
> > Geoff,
> >
> > I've hit this problem several times in the past, with OpenSolaris
> > and then with OpenIndiana.
> >
> > There are, to my knowledge, no available solutions, it is so by
> > design!
> >
> > If a disk stops responding the pool waits until after it responds
> > again (sometimes pulling it out of its slot and then reinserting
> > the disk causes a reset of the link and it starts working again).
> >
> > I was not able to assess what happens if I set failmode to
continue.
> >
> > I think it could be no better since you still cannot write to the
pool.
> >
> > This is IMHO the biggest problem of ZFS, in that I cannot
> > instruct it to stop using a failed device if it has some level of
> > redundancy still available.
> >
> > Wait is OK only if an entire vdev stops responding, not if a disk
> > in a vdev with redundancy has problems either fatal or
> > transitory.
> >
> > Best regards.
> >
> > Maurilio.
> >
> >
> > PS. Using server grade disks (those with TLER) makes it possibile
> > to overcome this problem for transitory errors.
> >
> >
> > Geoff Nordli wrote:
> >
> >> Part of my concern is why one disk would have completely brought
> >> down the system. I have seen this come up on the list before,
> >> but I don't remember any resolutions to fixing it.
> >>
> >> Anyone have any clues to try to prevent this from happening in
> >> the future?
> >>
> >> thanks,
> >>
> >> Geoff
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
CONFIDENTIALITY NOTICE: This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law. All rights to privilege are expressly
claimed and reserved and are not waived. Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited. If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss