On Sat, Jan 29, 2005 at 10:44:41AM -0600, James Bottomley wrote: > On Fri, 2005-01-28 at 21:46 -0800, Andrew Vasquez wrote: > > Returning back DID_IMM_RETRY for these 'transport' related conditions > > would of course help in this issue -- but at the same time bring with it > > several side-effects which may not be desirable. > > > > So, beyond this particular circumstance, what would be considered a > > 'proper' return status for this type of event? > > Well, the correct return, since this is a condition from the storage, is > simply the check condition and the sense code (rather than having the > driver interpret it).
But the transport hit a failure, not the storage device. I thought Andrew hit this sequence: - pull / replace cable - IO resumes but gets NOT_READY (the device could be logging back into the fibre or such) - a FC transport problem is hit, DID_BUSY_BUSY is returned, but scmd->retries has already been exhausted by the NOT_READY Did I misread something? > > > Would this be an approach to consider? Or should we tackle the problem > > > by addressing the quirky (cmd->retries > cmd->allowed) state? > > That's what I think the correct approach should be....we have a few > other quirky devices that aren't pleased with our current NOT_READY > handling. Were you going to look into coding up a patch for this? We don't track what errors caused a retry (doing so is too painful), or reset the retries. In scsi_decide_disposition() if we get a few retry cases for one or multiple errors, and then a different error that should reasonably be a retry case, we return SUCCESS instead of NEEDS_RETRY. Why not just set scmd->retries to zero in scsi_requeue_command()? All callers are cases that we want to keep retrying if other errors are hit, and would fix other potential retry problems, not only the NOT_READY case. [There is one bad looking scsi_requeue_command() for UNIT_ATTENTION that looks like it could retry forever, independent of this problem.] Fixing the NOT_READY case to quiesce (and not incrementing retries) would fix the problem or make it much less likely, and is still a good idea. And as a long term goal, losing the retry count and moving to allowing all retries for a period of time would avoid other potential problems, and not be tied to the speed of the system. -- Patrick Mansfield - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html