I have implemented Christoph's suggestions and comments in his reply
to my RFC.
--------------------------------------------------------------------
Due to a firmware mismatch between a host and target (names withheld to
protect the innocent?), the LLDD was returning DID_RESET for every
i/o command. This patch modifies the scsi layer to take into account
when the command which received DID_RESET was issued and eventually
give up on it instead of unconditionally reissuing it forever.
With this patch, on my test system, the command receiving the solid
DID_RESET times out after about 360 seconds.
The premise for this patch is that no command should have an infinite
lifetime. The impetus for this patch was a system which would not
reach a command prompt without disconnecting the storage from the
host.
The significant change in this patch is to call scsi_queue_insert()
instead of scsi_requeue_command() if the command which receives a
DID_RESET did not complete any i/o (good_bytes==0). scsi_queue_insert()
does not release the command and regenerate it like scsi_requeue_command()
does, hence jiffies_at_alloc reflects when the command was first issued.
Per Christoph's suggestion, I have broken this into two patches. The
first implements the moving of the scsi_release_buffer() calls so that
it can be more easily reviewed. The second patch implements the
lifetime timer for commands receiving DID_RESET.
These patches differ from the first posting in that scsi_queue_insert()
is called directly instead of calling the since removed scsi_retry_command();
comments have been cleaned up; and a comment has been added to indicate
that the code is supposed to fall through to call scsi_end_request()
if the command which received DID_RESET has expired.
These patches were tested by modifying a LLDD to return DID_RESET for
every command received.
Patches apply to 2.6.20-rc6-git1.
Signed-off-by: Michael Reed <[EMAIL PROTECTED]>
Christoph Hellwig wrote:
> On Mon, Dec 11, 2006 at 03:42:34PM -0600, Michael Reed wrote:
>> Due to a firmware mismatch between a host and target (names withheld to
>> protect the innocent?), the LLDD was returning DID_RESET for every
>> i/o command. This patch modifies the scsi layer to take into account
>> when the command which received DID_RESET was issued and eventually
>> give up on it instead of unconditionally reissuing it forever
>> when it receives a DID_RESET. With this patch, on my test system,
>> the command receiving the constant DID_RESET times out after about
>> 360 seconds.
>>
>> The premise for this patch is that no command should have an infinite
>> lifetime. The impetus for this patch was a system which would not
>> reach a command prompt without disconnecting the storage from the
>> host.
>>
>> The significant change in this patch is to call scsi_retry_command()
>> instead of scsi_requeue_command() if the command which receives a
>> DID_RESET did not complete any i/o (good_bytes==0). scsi_retry_command()
>> does not release the command and regenerate it like scsi_requeue_command()
>> does, hence jiffies_at_alloc reflects when the command was first issued.
>
> Generally this patch looks good to me. Some comments:
>
>
>> -extern int scsi_retry_command(struct scsi_cmnd *cmd);
>> +extern int scsi_retry_command(struct scsi_cmnd *cmd, int reason);
>
> I've just sent a patch to kill scsi_retry_command. Use
> scsi_queue_insert directly instead.
>
>> + scsi_release_buffers(cmd);
>
> Can you please separate out the moving of the scsi_release_buffer
> calls into a separate patch so it can be audited better?
>
>> @@ -961,9 +992,20 @@ void scsi_io_completion(struct scsi_cmnd
>> /* Third party bus reset or reset for error recovery
>> * reasons. Just retry the request and see what
>> * happens.
>> + * If no data was transferred, just reissue this
>> + * command. If data was transferred, regenerate
>> + * the command to transfer only untransferred data.
>> */
>
> The whole comment should look more like:
>
> /*
> * Third party bus reset or reset for error recovery reasons.
> * If no data was transferred, just reissue this command.
> * If data was transferred, regenerate the command to transfer
> * only untransferred data.
> */
>
>> - scsi_requeue_command(q, cmd);
>> - return;
>> + if (!good_bytes) {
>> + if (!(scsi_command_expired(cmd))) {
>> + scsi_retry_command(cmd, SCSI_MLQUEUE_DID_RESET);
>> + return;
>> + }
>> + }
>> + else {
>> + scsi_requeue_command(q, cmd);
>> + return;
>> + }
>
> With this code we now fallthrough if we don't have any good bytes
> and the command has expired. Is this the expected behaviour? If
> yes we need a good comment describing it.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html