I have implemented Christoph's suggestions and comments in his reply
to my RFC.
--------------------------------------------------------------------

Due to a firmware mismatch between a host and target (names withheld to
protect the innocent?), the LLDD was returning DID_RESET for every
i/o command.  This patch modifies the scsi layer to take into account
when the command which received DID_RESET was issued and eventually
give up on it instead of unconditionally reissuing it forever.
With this patch, on my test system, the command receiving the solid
DID_RESET times out after about 360 seconds.

The premise for this patch is that no command should have an infinite
lifetime.  The impetus for this patch was a system which would not
reach a command prompt without disconnecting the storage from the
host.

The significant change in this patch is to call scsi_queue_insert()
instead of scsi_requeue_command() if the command which receives a
DID_RESET did not complete any i/o (good_bytes==0).  scsi_queue_insert()
does not release the command and regenerate it like scsi_requeue_command()
does, hence jiffies_at_alloc reflects when the command was first issued.

Per Christoph's suggestion, I have broken this into two patches.  The
first implements the moving of the scsi_release_buffer() calls so that
it can be more easily reviewed.  The second patch implements the
lifetime timer for commands receiving DID_RESET.

These patches differ from the first posting in that scsi_queue_insert()
is called directly instead of calling the since removed scsi_retry_command();
comments have been cleaned up; and a comment has been added to indicate
that the code is supposed to fall through to call scsi_end_request()
if the command which received DID_RESET has expired.

These patches were tested by modifying a LLDD to return DID_RESET for
every command received.

Patches apply to 2.6.20-rc6-git1.

Signed-off-by: Michael Reed <[EMAIL PROTECTED]>

Christoph Hellwig wrote:
> On Mon, Dec 11, 2006 at 03:42:34PM -0600, Michael Reed wrote:
>> Due to a firmware mismatch between a host and target (names withheld to
>> protect the innocent?), the LLDD was returning DID_RESET for every
>> i/o command.  This patch modifies the scsi layer to take into account
>> when the command which received DID_RESET was issued and eventually
>> give up on it instead of unconditionally reissuing it forever
>> when it receives a DID_RESET.  With this patch, on my test system,
>> the command receiving the constant DID_RESET times out after about
>> 360 seconds.
>>
>> The premise for this patch is that no command should have an infinite
>> lifetime.  The impetus for this patch was a system which would not
>> reach a command prompt without disconnecting the storage from the
>> host.
>>
>> The significant change in this patch is to call scsi_retry_command()
>> instead of scsi_requeue_command() if the command which receives a
>> DID_RESET did not complete any i/o (good_bytes==0).  scsi_retry_command()
>> does not release the command and regenerate it like scsi_requeue_command()
>> does, hence jiffies_at_alloc reflects when the command was first issued.
> 
> Generally this patch looks good to me.  Some comments:
> 
> 
>> -extern int scsi_retry_command(struct scsi_cmnd *cmd);
>> +extern int scsi_retry_command(struct scsi_cmnd *cmd, int reason);
> 
>       I've just sent a patch to kill scsi_retry_command.  Use
>       scsi_queue_insert directly instead.
> 
>> +    scsi_release_buffers(cmd);
> 
>       Can you please separate out the moving of the scsi_release_buffer
>       calls into a separate patch so it can be audited better?
> 
>> @@ -961,9 +992,20 @@ void scsi_io_completion(struct scsi_cmnd
>>              /* Third party bus reset or reset for error recovery
>>               * reasons.  Just retry the request and see what
>>               * happens.
>> +             * If no data was transferred, just reissue this
>> +             * command.  If data was transferred, regenerate
>> +             * the command to transfer only untransferred data.
>>               */
> 
> The whole comment should look more like:
> 
>               /*
>                * Third party bus reset or reset for error recovery reasons.
>                * If no data was transferred, just reissue this command.
>                * If data was transferred, regenerate the command to transfer
>                * only untransferred data.
>                */
> 
>> -            scsi_requeue_command(q, cmd);
>> -            return;
>> +            if (!good_bytes) {
>> +                    if (!(scsi_command_expired(cmd))) {
>> +                            scsi_retry_command(cmd, SCSI_MLQUEUE_DID_RESET);
>> +                            return;
>> +                    }
>> +            }
>> +            else {
>> +                    scsi_requeue_command(q, cmd);
>> +                    return;
>> +            }
> 
>       With this code we now fallthrough if we don't have any good bytes
>       and the command has expired.  Is this the expected behaviour? If
>       yes we need a good comment describing it.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to