fw-sbp2 is unable to reconnect while performing __scsi_add_device because there is only a single workqueue thread context available for both at the moment. This should be fixed eventually.
Until then, take care that __scsi_add_device does not return success even though the SCSI high-level driver probing failed (sd READ_CAPACITY and friends) due to bus reset. The trick to do so is to use a different error indicator in the command completion as long as __scsi_add_device did not return. An actual failure of __scsi_add_device is easy to handle, but an incomplete execution of __scsi_add_device with an sdev returned would remain undetected and leave the SBP-2 target unusable. Signed-off-by: Stefan Richter <[EMAIL PROTECTED]> --- Jarod, does this work like I assume and fixes your setup of two OXFW911 based disks? drivers/firewire/fw-sbp2.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) Index: linux/drivers/firewire/fw-sbp2.c =================================================================== --- linux.orig/drivers/firewire/fw-sbp2.c +++ linux/drivers/firewire/fw-sbp2.c @@ -825,28 +825,17 @@ static void sbp2_login(struct work_struc sdev = __scsi_add_device(shost, 0, 0, scsilun_to_int(&eight_bytes_lun), lu); if (IS_ERR(sdev)) { - /* - * The most frequent cause for __scsi_add_device() to fail - * is a bus reset while sending the SCSI INQUIRY. Try again. - */ + /* Probably a bus reset happened. Try again. */ goto out_logout_login; } else if (sdev->sdev_state == SDEV_OFFLINE) { - /* - * FIXME: We are unable to perform reconnects while in - * sbp2_login(). Therefore __scsi_add_device() will get - * into trouble if a bus reset happens in parallel. - * It will either fail (that's OK, see above) or take sdev - * offline. Here is a crude workaround for the latter. - */ + /* Something else happened which left the device unusable. */ scsi_device_put(sdev); scsi_remove_device(sdev); goto out_logout_login; } else { - /* - * Can you believe it? Everything went well. - */ + /* Can you believe it? Everything went well. */ lu->sdev = sdev; smp_wmb(); /* We need lu->sdev when we want to block lu. */ atomic_dec(&lu->tgt->dont_block); @@ -1237,8 +1226,18 @@ complete_command_orb(struct sbp2_orb *ba * If the orb completes with status == NULL, something * went wrong, typically a bus reset happened mid-orb * or when sending the write (less likely). + * + * Normally, tell the SCSI core we are busy so that the + * command is retried. But if the sdev is still being + * set up, pretend that the device went away so that + * __scsi_add_device will fail. sbp2_login will retry it. + * + * FIXME: There is potentially a small window after + * __scsi_add_device returned but before lu->sdev is + * set during which we rather want to say DID_BUS_BUSY. */ - result = DID_BUS_BUSY << 16; + result = orb->lu->sdev != NULL ? DID_BUS_BUSY << 16 : + DID_NO_CONNECT << 16; sbp2_conditionally_block(orb->lu); } -- Stefan Richter -=====-==--- --=- --==- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/