Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

2006-11-29 Thread Benny Halevy

Jens Axboe wrote:

On Mon, Nov 27 2006, Mike Christie wrote:

Mike Christie wrote:

Boaz Harrosh wrote:

Playing with some tests which I admit are not 100% orthodox I have
stumbled upon a bug that raises a serious question:

In the call to scsi_execute_async() in the use_sg case, must the
scatterlist* (pointed to by buffer) map a buffer that's contiguous in
virtual memory or is it allowed to map disjoint segments of memory?

I thought they were continguous. I think James has said before that they
can be disjoint. When we converted sg it did not look like sg or st
supported disjoint. The main non dio path used a buffer from
get_free_pages so I thought that would always be contiguous. The dio
path then always set the first sg offset, but the rest it set to zero.

And the len is set to page size for the middle entries too.

But for the non DIO st path we can end up with some middle sg entires
that are not a full page so that code in scsi_execute_async is broken
for that.


If something doesn't work with non-contig sg entries, that would be a
bug. If the question is regarding holes in the sg list, that is probably
unchartered territory and I would not regard that as supported.



Jens, I'm not sure I understand the terms you used.  Can you please
define more clearly what you mean by "non-contig sg entries" vs.
"holes in the sg list"?

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6 patch] drivers/scsi/scsi_error.c should #include "scsi_transport_api.h"

2006-11-29 Thread Adrian Bunk
Every file should #include the headers containing the prototypes for 
its global functions.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

--- linux-2.6.19-rc6-mm2/drivers/scsi/scsi_error.c.old  2006-11-29 
09:58:41.0 +0100
+++ linux-2.6.19-rc6-mm2/drivers/scsi/scsi_error.c  2006-11-29 
09:58:58.0 +0100
@@ -36,6 +36,7 @@
 
 #include "scsi_priv.h"
 #include "scsi_logging.h"
+#include "scsi_transport_api.h"
 
 #define SENSE_TIMEOUT  (10*HZ)
 #define START_UNIT_TIMEOUT (30*HZ)

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic94xx panic on module load

2006-11-29 Thread Mark Haverkamp
On Tue, 2006-11-28 at 20:52 -0500, Douglas Gilbert wrote:
> Mark Haverkamp wrote:
> > On Tue, 2006-11-28 at 13:46 -0800, Mark Haverkamp wrote:
> >> On Tue, 2006-11-28 at 13:44 -0500, Douglas Gilbert wrote:
> >>
> >> [ ... ]
> >>
> > 
> > I don't know if this helps, but I found the verbose option.  Here is a
> > little debug output.
> > 
> > 
> > ./smp_discover -v  -p 12 -s 0x500508b300a27a2f /dev/mptctl
> > Discover request: 40 10 00 02 00 00 00 00 00 0c 00 00 00 00 00 00
> > send_req_mpt: subvalue=0  SAS address=0x500508b300a27a2f
> > mptctl two scatter gather list interface
> > IOCStatus=0x1
> > IOCStatus=0x1 IOCLogInfo=0xA27A2F SASStatus=0x0
> > smp_send_req failed, res=-1
> 
> Mark,
> The iocnum may be greater than 0 (especially if you have
> other MPT Fusion HBAs (any kind) in that computer).
> Have a look in the log around where the mptsas driver
> is registered and look for the string "ioc". The number
> following "ioc" is what you need. If you find "ioc3" then
> try:
> 
>  ./smp_discover -p 12 -s 0x500508b300a27a2f /dev/mptctl,3

OK thanks, I do have an mptspi card too. 
 
# ./smp_discover -v  -p 12 -s 0x500508b300a27a2f /dev/mptctl,2
Discover request: 40 10 00 02 00 00 00 00 00 0c 00 00 00 00 00 00
send_req_mpt: subvalue=2  SAS address=0x500508b300a27a2f
mptctl two scatter gather list interface
Discover response:
  expander change count: 0
  phy identifier: 12
  attached device type: end device
  negotiated physical link rate: phy enabled; 3 Gbps
  attached initiator: ssp=0 stp=0 smp=0 sata_host=0
  attached sata port selector: 0
  attached target: ssp=0 stp=1 smp=0 sata_device=0
  SAS address: 0x500508b300a27a2f
  attached SAS address: 0x500508b300a27a2c
  attached phy identifier: 0
  attached inside ZPSDS persistent: 0
  attached requested inside ZPSDS: 0
  attached break_reply capable: 0
  programmed minimum physical link rate: 3 Gbps
  hardware minimum physical link rate: 3 Gbps
  programmed maximum physical link rate: 3 Gbps
  hardware maximum physical link rate: 3 Gbps
  phy change count: 105
  virtual phy: 1
  partial pathway timeout value: 7 us
  routing attribute: direct
  connector type: No information
  connector element index: 0
  connector physical link: 0


> 
> To verify that expander SAS address, try this:
>   find /sys -name "sas_device:expander*"
> cd to any directory found and try "cat sas_address".

0x500508b300a27a2f

> 
> 
> BTW there is a smp_utils version 0.92 beta at
> http://www.torque.net/sg
> the error messages are somewhat clearer.
> 
> 
> Doug Gilbert
> 
-- 
Mark Haverkamp <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic94xx panic on module load

2006-11-29 Thread Douglas Gilbert
Mark Haverkamp wrote:
> On Tue, 2006-11-28 at 20:52 -0500, Douglas Gilbert wrote:
>> Mark Haverkamp wrote:
>>> On Tue, 2006-11-28 at 13:46 -0800, Mark Haverkamp wrote:
 On Tue, 2006-11-28 at 13:44 -0500, Douglas Gilbert wrote:

 [ ... ]

>>> I don't know if this helps, but I found the verbose option.  Here is a
>>> little debug output.
>>>
>>>
>>> ./smp_discover -v  -p 12 -s 0x500508b300a27a2f /dev/mptctl
>>> Discover request: 40 10 00 02 00 00 00 00 00 0c 00 00 00 00 00 00
>>> send_req_mpt: subvalue=0  SAS address=0x500508b300a27a2f
>>> mptctl two scatter gather list interface
>>> IOCStatus=0x1
>>> IOCStatus=0x1 IOCLogInfo=0xA27A2F SASStatus=0x0
>>> smp_send_req failed, res=-1
>> Mark,
>> The iocnum may be greater than 0 (especially if you have
>> other MPT Fusion HBAs (any kind) in that computer).
>> Have a look in the log around where the mptsas driver
>> is registered and look for the string "ioc". The number
>> following "ioc" is what you need. If you find "ioc3" then
>> try:
>>
>>  ./smp_discover -p 12 -s 0x500508b300a27a2f /dev/mptctl,3
> 
> OK thanks, I do have an mptspi card too. 
>  
> # ./smp_discover -v  -p 12 -s 0x500508b300a27a2f /dev/mptctl,2
> Discover request: 40 10 00 02 00 00 00 00 00 0c 00 00 00 00 00 00
> send_req_mpt: subvalue=2  SAS address=0x500508b300a27a2f
> mptctl two scatter gather list interface
> Discover response:
>   expander change count: 0
>   phy identifier: 12
>   attached device type: end device
>   negotiated physical link rate: phy enabled; 3 Gbps
   
A "heads up" here. the "physical" has now be changed
to "logical" in SAS-2. The idea is that up to 4
logical 1.5 Gbps links (e.g. to SATA disks) can be
multiplexed on one 6 Gbps physical link.

>   attached initiator: ssp=0 stp=0 smp=0 sata_host=0
>   attached sata port selector: 0
>   attached target: ssp=0 stp=1 smp=0 sata_device=0
>   SAS address: 0x500508b300a27a2f
>   attached SAS address: 0x500508b300a27a2c
>   attached phy identifier: 0
>   attached inside ZPSDS persistent: 0
>   attached requested inside ZPSDS: 0
>   attached break_reply capable: 0
>   programmed minimum physical link rate: 3 Gbps
>   hardware minimum physical link rate: 3 Gbps
>   programmed maximum physical link rate: 3 Gbps
>   hardware maximum physical link rate: 3 Gbps
>   phy change count: 105
>   virtual phy: 1
>   partial pathway timeout value: 7 us
>   routing attribute: direct
>   connector type: No information
>   connector element index: 0
>   connector physical link: 0

Mark,
Finally ...
So the above is somewhat strange as it indicates a STP target but
not a SATA device. The phy is also flagged as virtual which means
that target port is within the expander.

So my guess is that the mptsas driver (or firmware) skips that
device while the aic94xx driver tries to set it up as a SATA
target and falls off the rails (naturally it shouldn't oops).

Hopefully Luben chips in here with what should happen ...

>> To verify that expander SAS address, try this:
>>   find /sys -name "sas_device:expander*"
>> cd to any directory found and try "cat sas_address".
> 
> 0x500508b300a27a2f
> 
>>
>> BTW there is a smp_utils version 0.92 beta at
>> http://www.torque.net/sg
>> the error messages are somewhat clearer.

Eric Moore pointed out to me that the ioc_num can also be
found in /proc/scsi/mptsas/ and
/sys/class/scsi_host/host/unique_id
so I have updated the smp_utils documentation.


So this is an instructive case of using one manufacturer's
HBA and driver to debug a driver for another manufacturer's
HBA.

Doug Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6 PATCH] sym53c8xx_2 claims cpqarray device

2006-11-29 Thread Chip Coldwell

Apropos this thread

http://marc.theaimsgroup.com/?l=linux-scsi&m=115591706804045&w=2

which led to this patch

http://www.kernel.org/git/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=b2b3c121076961333977f485f0d54c22121df920

do we not also need the following patch, nine lines lower in the same file?

Signed-off-by: chip Coldwell <[EMAIL PROTECTED]>

--- linux-2.6.18/drivers/scsi/sym53c8xx_2/sym_glue.c2006-09-20 
09:12:06.0 +0530
+++ linux-2.6.18/drivers/scsi/sym53c8xx_2/sym_glue.c2006-10-10 
19:14:56.0 +0530
@@ -2094,7 +2094,7 @@
{ PCI_VENDOR_ID_LSI_LOGIC, PCI_DEVICE_ID_NCR_53C875,
  PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL },
{ PCI_VENDOR_ID_LSI_LOGIC, PCI_DEVICE_ID_NCR_53C1510,
- PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL }, /* new */
+ PCI_ANY_ID, PCI_ANY_ID,  PCI_CLASS_STORAGE_SCSI<<8,  0x00, 0UL }, 
/* new */
{ PCI_VENDOR_ID_LSI_LOGIC, PCI_DEVICE_ID_LSI_53C895A,
  PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0UL },
{ PCI_VENDOR_ID_LSI_LOGIC, PCI_DEVICE_ID_LSI_53C875A,

Chip

-- 
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc.
1-978-392-2426
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] drivers/scsi/scsi_error.c should #include "scsi_transport_api.h"

2006-11-29 Thread Matthew Wilcox
On Wed, Nov 29, 2006 at 11:04:22AM +0100, Adrian Bunk wrote:
> +#include "scsi_transport_api.h"

scsi_transport_api.h is a weird little file.  It's not included by
anything in the drivers/scsi directory, only
drivers/scsi/libsas/sas_scsi_host.c:#include "../scsi_transport_api.h"
drivers/ata/libata-eh.c:#include "../scsi/scsi_transport_api.h"

To me, that says it should be living in include/scsi/ somewhere ...
maybe just put the one function prototype into scsi_eh.h?
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] drivers/scsi/scsi_error.c should #include "scsi_transport_api.h"

2006-11-29 Thread James Smart



Matthew Wilcox wrote:

On Wed, Nov 29, 2006 at 11:04:22AM +0100, Adrian Bunk wrote:

+#include "scsi_transport_api.h"


scsi_transport_api.h is a weird little file.  It's not included by
anything in the drivers/scsi directory, only
drivers/scsi/libsas/sas_scsi_host.c:#include "../scsi_transport_api.h"
drivers/ata/libata-eh.c:#include "../scsi/scsi_transport_api.h"

To me, that says it should be living in include/scsi/ somewhere ...
maybe just put the one function prototype into scsi_eh.h?


would it only go in include/scsi if it intends to be an exported
api for LLDD's and/or user apps ?  and stay in drivers/scsi if its
an internal api within the scsi subsystem itself ?

Based on who uses it, I would say its internal right now.

-- james s
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] drivers/scsi/scsi_error.c should #include "scsi_transport_api.h"

2006-11-29 Thread Matthew Wilcox
On Wed, Nov 29, 2006 at 09:13:35AM -0500, James Smart wrote:
> would it only go in include/scsi if it intends to be an exported
> api for LLDD's and/or user apps ?  and stay in drivers/scsi if its
> an internal api within the scsi subsystem itself ?

It isn't clear to me that's the intended use of include/scsi.  If it is,
it's already being violated, eg by

$ find * -type f |xargs grep scsi_host_scan_allowed
drivers/scsi/scsi_scan.c:   if (scsi_host_scan_allowed(shost))
drivers/scsi/scsi_scan.c:   if (scsi_host_scan_allowed(shost))
drivers/scsi/scsi_scan.c:   if (scsi_host_scan_allowed(shost)) {
drivers/scsi/scsi_scan.c:   if (!scsi_host_scan_allowed(shost))
include/scsi/scsi_host.h: * scsi_host_scan_allowed - Is scanning of this
host allowed
include/scsi/scsi_host.h:static inline int scsi_host_scan_allowed(struct
Scsi_Host *shost)

(a good candidate to be moved to scsi_scan.c, in fact!)

scsi_host_state_name, scsi_normalize_sense, scsi_reset_provider,
scsi_test_unit_ready, scsi_put_command are all in similar usage to
scsi_schedule_eh.  There's probably more, I just picked some likely
looking candidates.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aic94xx panic on module load

2006-11-29 Thread Mark Haverkamp
On Tue, 2006-11-28 at 10:01 -0800, Darrick J. Wong wrote:
> Mark Haverkamp wrote:
> > I got this panic when loading the aic94xx module.  The adapter is
> > connected to an HP MSA50 SAS enclosure with 3 72GB SAS disks.
> > 
> > Kernel 2.6.19-rc6-scsi-misc on an x86_64
> 
> > sas: task finished with resp:0x0, stat:0x89
> > sas: sas_discover_sata() for device 500508b300a27a2c at 
> > 500508b300a27a2f:0xc returned 0xff06
> > kobject_add failed for port-2:0:12 with -EEXIST, don't try to register 
> > things with the same name in the same directory.
> 
> Your expander is reporting your SAS disks to aic94xx as SATA disks,
> which is why the sas_discover_sata fails.  I don't know why it would do
> that... flaky hardware?  I'm not really sure what to do when we're given
> bad information.
> 
> > Kernel BUG at drivers/scsi/libsas/sas_expander.c:603
> 
> I believe this BUG is fixed by a few patches in aic94xx-sas.  For sure
> you'll want the patch named "libsas: better error handling in
> sas_ex_discover_end_dev()" patch; see commit
> 82f6bc0849b6fce9a965dde11dd6f685adc7285e.
> 
> There are some dependencies:
> e384a0bdd9d3abb5ba2f6eac9ac4d0ac61e1c6a1 ->
> 1f8787b198c4ba058a0bfc06c2ca7f301168a5dd ->
> 82f6bc0849b6fce9a965dde11dd6f685adc7285e.


I tried out the aic94xx-sas-2.6 kernel on my system.  It didn't panic
and did find the disks.  It looks like those fixes that you mentioned
did it.  Do you know when they may be propagated towards one of the scsi
trees or the main kernel tree?

Thanks,
Mark.

Nov 29 11:29:10 odt2-003 kernel: aic94xx: Adaptec aic94xx SAS/SATA driver 
version 1.0.2 unloaded
Nov 29 11:29:19 odt2-003 kernel: aic94xx: Adaptec aic94xx SAS/SATA driver 
version 1.0.2 loaded
Nov 29 11:29:19 odt2-003 kernel: PCI: Enabling device :08:01.0 (0110 -> 
0113)
Nov 29 11:29:19 odt2-003 kernel: ACPI: PCI Interrupt :08:01.0[A] -> GSI 28 
(level, low) -> IRQ 28
Nov 29 11:29:19 odt2-003 kernel: aic94xx: found Adaptec AIC-9410W SAS/SATA Host 
Adapter, device :08:01.0
Nov 29 11:29:19 odt2-003 kernel: scsi3 : aic94xx
Nov 29 11:29:19 odt2-003 kernel: aic94xx: BIOS present (1,2), 1673
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ue num:3, ue size:88
Nov 29 11:29:19 odt2-003 kernel: aic94xx: manuf sect SAS_ADDR 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: manuf sect PCBA SN 0BD0C625005W
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: num_phy_desc: 8
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy0: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy1: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy2: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy3: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy4: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy5: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy6: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: phy7: ENEBLEABLE
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: max_phys:0x8, num_phys:0x8
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ms: enabled_phys:0xff
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy0: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy1: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy2: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy3: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy4: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy5: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy6: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: ctrla: phy7: sas_addr: 
5d100045af00, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
Nov 29 11:29:19 odt2-003 kernel: aic94xx: max_scbs:512, max_ddbs:128
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy0 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy1 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy2 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy3 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy4 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy5 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy6 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: setting phy7 addr to 5d100045af00
Nov 29 11:29:19 odt2-003 kernel: aic94xx: num_edbs:21
Nov 29 11:29:19 odt2-003 kernel: aic94xx: num_escbs:3
Nov 29 11:29:19 odt2-003 kernel: aic94xx: using sequencer V17/10c6
Nov 29 11:29:19 odt2-003

problems to expect with >2TB volumes

2006-11-29 Thread Bernd Schubert
Hi,

we have not bought the device yet, but presently in the process to do so.
Before we buy it, I want to know about problems in advance...

I'm somewhat worried about this problem report
http://lists.freebsd.org/pipermail/aic7xxx/2006-January/thread.html#4280
Especially as I don't see a final solution... 
We also want to buy the very same raid device and also connect it to an
already existing aic79xx controller.

Thanks in advance,
Bernd


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems to expect with >2TB volumes

2006-11-29 Thread Douglas Gilbert
Bernd Schubert wrote:
> Hi,
> 
> we have not bought the device yet, but presently in the process to do so.
> Before we buy it, I want to know about problems in advance...

None that I'm aware of from the point of view of the
Linux SCSI subsystem (starting at about half way through
the lk 2.4 series or 4 years ago).

> I'm somewhat worried about this problem report
> http://lists.freebsd.org/pipermail/aic7xxx/2006-January/thread.html#4280
> Especially as I don't see a final solution... 
> We also want to buy the very same raid device and also connect it to an
> already existing aic79xx controller.

On reviewing that thread, the original poster was
jumping to premature conclusions. Justin Gibbs told
him there was no such problem (and Justin is well
placed to know). Then the final post shows a trace
with READ(10) commands failing. They are 32 bit lba
read operations that have been the default for about
10 years in the SCSI subsystem. If those fail on that
transport it is probably a termination problem.
When Justin saw that he probably didn't bother
responding again.

Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Infinite retries reading the partition table

2006-11-29 Thread Luben Tuikov
Suppose reading sector 0 always reports an error,
sense key HARDWARE ERROR.

What I'm observing is that the request to read sector 0,
reading partition information, is retried forever, ad infinitum.

Does anyone have a patch to resolve this? (2.6.19-rc6)

Thanks,
Luben

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infinite retries reading the partition table

2006-11-29 Thread Luben Tuikov
--- Luben Tuikov <[EMAIL PROTECTED]> wrote:

> Suppose reading sector 0 always reports an error,
> sense key HARDWARE ERROR.
> 
> What I'm observing is that the request to read sector 0,
> reading partition information, is retried forever, ad infinitum.
> 
> Does anyone have a patch to resolve this? (2.6.19-rc6)

Actually the device sends SK: MEDIUM ERROR, ASC: UNRECOVERED READ ERR,
but SCSI Core seems to retry reading the partition table (sector 0)
forever.

Anyone seen this and/or has a patch in their tree for it?

   Luben
P.S.  This is fairly straightforward to inject/test.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] Fix sense key MEDIUM ERROR processing and retry

2006-11-29 Thread Luben Tuikov
1) If the device reports an uncorrectable MEDIUM ERROR, such
as SK MEDIUM ERROR, ASC UNRECOVERED READ ERR, AMNF DATA
FIELD or RECORD NOT FOUND, then: In scsi_check_sense()
return SUCCESS so as to not retry -- the error is
uncorrectable -- this speeds up total processing time.

2) In scsi_io_completion(), retry if and only if there was
at least one byte completed, i.e. good_bytes != 0.  If
good_bytes == 0, don't try to retry.

Without this patch, SCSI Core gets hung reading sector 0
forever, for example when reading the partition table of a
(newly discovered) device.

Signed-off-by: Luben Tuikov <[EMAIL PROTECTED]>
---
 drivers/scsi/scsi_error.c |5 +
 drivers/scsi/scsi_lib.c   |3 ++-
 2 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index aff1b0c..011dd32 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -359,6 +359,11 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
return SUCCESS;
 
case MEDIUM_ERROR:
+   if (sshdr.asc == 0x11 || /* UNRECOVERED READ ERR */
+   sshdr.asc == 0x13 || /* AMNF DATA FIELD */
+   sshdr.asc == 0x14) { /* RECORD NOT FOUND */
+   return SUCCESS;
+   }
return NEEDS_RETRY;
 
case HARDWARE_ERROR:
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8b208b4..c5fa329 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -866,7 +866,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int 
good_bytes)
 * are leftovers and there is some kind of error
 * (result != 0), retry the rest.
 */
-   if (scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
+   if (good_bytes &&
+   scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
return;
 
/* good_bytes = 0, or (inclusive) there were leftovers and
-- 
1.4.4.1.g7a0e



Re: aic94xx panic on module load

2006-11-29 Thread Luben Tuikov
--- Douglas Gilbert <[EMAIL PROTECTED]> wrote:
> Mark Haverkamp wrote:
> > On Tue, 2006-11-28 at 20:52 -0500, Douglas Gilbert wrote:
> >> Mark Haverkamp wrote:
> >>> On Tue, 2006-11-28 at 13:46 -0800, Mark Haverkamp wrote:
>  On Tue, 2006-11-28 at 13:44 -0500, Douglas Gilbert wrote:
> 
>  [ ... ]
> 
> >>> I don't know if this helps, but I found the verbose option.  Here is a
> >>> little debug output.
> >>>
> >>>
> >>> ./smp_discover -v  -p 12 -s 0x500508b300a27a2f /dev/mptctl
> >>> Discover request: 40 10 00 02 00 00 00 00 00 0c 00 00 00 00 00 00
> >>> send_req_mpt: subvalue=0  SAS address=0x500508b300a27a2f
> >>> mptctl two scatter gather list interface
> >>> IOCStatus=0x1
> >>> IOCStatus=0x1 IOCLogInfo=0xA27A2F SASStatus=0x0
> >>> smp_send_req failed, res=-1
> >> Mark,
> >> The iocnum may be greater than 0 (especially if you have
> >> other MPT Fusion HBAs (any kind) in that computer).
> >> Have a look in the log around where the mptsas driver
> >> is registered and look for the string "ioc". The number
> >> following "ioc" is what you need. If you find "ioc3" then
> >> try:
> >>
> >>  ./smp_discover -p 12 -s 0x500508b300a27a2f /dev/mptctl,3
> > 
> > OK thanks, I do have an mptspi card too. 
> >  
> > # ./smp_discover -v  -p 12 -s 0x500508b300a27a2f /dev/mptctl,2
> > Discover request: 40 10 00 02 00 00 00 00 00 0c 00 00 00 00 00 00
> > send_req_mpt: subvalue=2  SAS address=0x500508b300a27a2f
> > mptctl two scatter gather list interface
> > Discover response:
> >   expander change count: 0
> >   phy identifier: 12
> >   attached device type: end device
> >   negotiated physical link rate: phy enabled; 3 Gbps
>
> A "heads up" here. the "physical" has now be changed
> to "logical" in SAS-2. The idea is that up to 4
> logical 1.5 Gbps links (e.g. to SATA disks) can be
> multiplexed on one 6 Gbps physical link.
> 
> >   attached initiator: ssp=0 stp=0 smp=0 sata_host=0
> >   attached sata port selector: 0
> >   attached target: ssp=0 stp=1 smp=0 sata_device=0
> >   SAS address: 0x500508b300a27a2f
> >   attached SAS address: 0x500508b300a27a2c
> >   attached phy identifier: 0
> >   attached inside ZPSDS persistent: 0
> >   attached requested inside ZPSDS: 0
> >   attached break_reply capable: 0
> >   programmed minimum physical link rate: 3 Gbps
> >   hardware minimum physical link rate: 3 Gbps
> >   programmed maximum physical link rate: 3 Gbps
> >   hardware maximum physical link rate: 3 Gbps
> >   phy change count: 105
> >   virtual phy: 1
> >   partial pathway timeout value: 7 us
> >   routing attribute: direct
> >   connector type: No information
> >   connector element index: 0
> >   connector physical link: 0
> 
> Mark,
> Finally ...
> So the above is somewhat strange as it indicates a STP target but
> not a SATA device. The phy is also flagged as virtual which means
> that target port is within the expander.
> 
> So my guess is that the mptsas driver (or firmware) skips that
> device while the aic94xx driver tries to set it up as a SATA
> target and falls off the rails (naturally it shouldn't oops).
> 
> Hopefully Luben chips in here with what should happen ...

I see a problematic expander: SAS phy reset sequence occured,
but STP is set to 1, on top of this the phy is indicated
to be virtual.

What is the intention here?  A testing bed for sending and
receiving SATA frames?

It is not inconceivable that someone would do this, although
I wouldn't recommend it for production systems.

 Luben

> >> To verify that expander SAS address, try this:
> >>   find /sys -name "sas_device:expander*"
> >> cd to any directory found and try "cat sas_address".
> > 
> > 0x500508b300a27a2f
> > 
> >>
> >> BTW there is a smp_utils version 0.92 beta at
> >> http://www.torque.net/sg
> >> the error messages are somewhat clearer.
> 
> Eric Moore pointed out to me that the ioc_num can also be
> found in /proc/scsi/mptsas/ and
> /sys/class/scsi_host/host/unique_id
> so I have updated the smp_utils documentation.
> 
> 
> So this is an instructive case of using one manufacturer's
> HBA and driver to debug a driver for another manufacturer's
> HBA.
> 
> Doug Gilbert
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html