Re: [PATCH] mpt2sas: don't handle broadcast primitives

2013-07-24 Thread Baruch Even
On Sat, Jul 20, 2013 at 1:11 AM, Jörn Engel wrote: > On Fri, 19 July 2013 18:06:59 -0400, Jörn Engel wrote: >> >> The handling of broadcast primitives involves >> _scsih_block_io_all_device(), which does what the name implies. I have >> observed cases with >60s of blocking io on all devices, caus

Re: interrupt coalescing in LSI HBA controller

2013-06-17 Thread Baruch Even
On Mon, Jun 17, 2013 at 11:34 PM, Zheng Da wrote: > Hello, > > I have tried lsiutil and it cannot recognize the LSI controllers. I > have talked to the LSI tech support and they told me that lsiutil > doesn't support this type of HBA controllers. > I found lsiutil in some other place. I couldn't f

Re: interrupt coalescing in LSI HBA controller

2013-06-17 Thread Baruch Even
On Fri, Jun 14, 2013 at 7:44 PM, Zheng Da wrote: > Hello, > > First, I'm not very sure if this mailing list allows to ask technical > questions. I apologize if it's not a right place to ask questions. > > I use one LSI SAS 9207-8e controllers to attach 8 SSDs to my server. > They together can deli

Re: [PATCH 3/4] scsi: improved eh timeout handler

2013-06-09 Thread Baruch Even
On Thu, Jun 6, 2013 at 12:43 PM, Hannes Reinecke wrote: > When a command runs into a timeout we need to send an 'ABORT TASK' > TMF. This is typically done by the 'eh_abort_handler' LLDD callback. > > Conceptually, however, this function is a normal SCSI command, so > there is no need to enter the

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-28 Thread Baruch Even
On Tue, May 28, 2013 at 5:38 PM, Jeremy Linton wrote: > This is another part of what formed my opinions about error > isolation. If one > of your devices goes out to lunch and isn't recovering via abort/lun reset. > Its done! Wrecking the rest of the SAN doing "bus resets" and HBA resets

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-27 Thread Baruch Even
On Mon, May 27, 2013 at 11:41 PM, James Bottomley wrote: > On Mon, 2013-05-27 at 16:39 +0200, Hannes Reinecke wrote: > >> - LLDDs typically won't return a command status even for a >> command which has been aborted via ABORT TASK TMF. >> So the midlayer probably will never get notified if >>

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Baruch Even
On Mon, May 13, 2013 at 6:58 PM, Jeremy Linton wrote: > On 5/13/2013 10:03 AM, Hannes Reinecke wrote: >> The other LUNs haven't reported an error. But how do you know whether they >> are still okay? The other LUNs might simply be idle, and no commands have >> been send to them. > > Well, h

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke wrote: > On 05/10/2013 07:51 PM, Baruch Even wrote: >> >> The error handling I have in mind (admittedly, not fully thought out) >> should work for both FC and SAS. Currently the error recovery >> progresses at the host

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 5:53 PM, Martin K. Petersen wrote: >>>>>> "Baruch" == Baruch Even writes: > > Baruch> Actually reducing the timeouts is probably not a good approach > Baruch> since it will cause the host to take a more radical approach &

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 5:01 PM, Ewan Milne wrote: > On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote: >> On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote: >> > >> > On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: >> > > Introduce eh_timeo

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote: > > On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: > > Introduce eh_timeout which can be used for error handling purposes. This > > was previously hardcoded to 10 seconds in the SCSI error handling > > code. However, for some fast-fa

Re: error handler scheduling

2013-04-12 Thread Baruch Even
On Fri, Apr 12, 2013 at 12:42 PM, Ren Mingxin wrote: > > Please let me summarize what this thread has talked about the scsi > eh latency: > > 1) some scsi cmds' timemout values are inappropriate, we can avoid >timeout by: >a) sg_format sets the IMMED bit and use TEST UNIT READY or REQUEST

Re: mpt2sas + raid10 goes boom

2013-04-08 Thread Baruch Even
> Apr 8 15:08:41 b4 kernel: [ 436.346595] mpt2sas0: log_info(0x31120320): > originator(PL), code(0x12), sub_code(0x0320) This log_info error code means a bad TX SGE, I don't know the code to point to the issue but it seems like there is a problem in the driver or the higher layers that provided