On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote:
> Let me comment in line.
> 
> On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanass...@acm.org> wrote:
> > 
> > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote:
> > > When SCSI blk-mq is enabled, there is a bug in handling errors in 
> > > scsi_queue_rq.
> > > Specifically, the bug is not setting result field of scsi_request 
> > > correctly when
> > > the dispatch of the command has been failed. Since the upper layer code
> > > including the sg_io ioctl expects to receive any error status from result 
> > > field
> > > of scsi_request, the error is silently ignored and this could cause data
> > > corruptions for some applications. This commit also fixes another bug 
> > > that the
> > > result field is not initialized when scsi_request is allocated.
> > > 
> > > Signed-off-by: Jaesoo Lee <ja...@purestorage.com>
> > > ---
> > >  block/scsi_ioctl.c      | 1 +
> > >  drivers/scsi/scsi_lib.c | 1 +
> > >  2 files changed, 2 insertions(+)
> > > 
> > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > > index 533f4ae..f2d7979 100644
> > > --- a/block/scsi_ioctl.c
> > > +++ b/block/scsi_ioctl.c
> > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req)
> > >         req->cmd = req->__cmd;
> > >         req->cmd_len = BLK_MAX_CDB;
> > >         req->sense_len = 0;
> > > +       req->result = 0;
> > >  }
> > >  EXPORT_SYMBOL(scsi_req_init);
> > 
> > What makes you think that this assignment is necessary?
> > 
> 
> Actually, I discovered this before fixing this bug and we might not
> see this problem anymore once this bug is fixed.
> 
> Previously, since we are not setting scsi_req(req)->result in
> scsi_queue_rq, I found that the application could receive another
> DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request'
> is allocated for the IO.
> 
> Please let me know if I need to remove this change.

Since SCSI LLDs have to set that result variable anyway if a request
completes successfully I'd prefer not to add that assignment.

> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 2018967..af1488d 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct
> > > blk_mq_hw_ctx *hctx,
> > >                         ret = BLK_STS_DEV_RESOURCE;
> > >                 break;
> > >         default:
> > > +               scsi_req(req)->result = DID_NO_CONNECT << 16;
> > >                 /*
> > >                  * Make sure to release all allocated ressources when
> > >                  * we hit an error, as we will never see this command
> > 
> > What leads you to the conclusion that (ret != BLK_STS_OK &&
> > ret != BLK_STS_RESOUCE) means that there is a connectivity issue?
> 
> I found this is what we are doing for legacy queue case; I referred to
> scsi_prep_return() and scsi_kill_request() code where we always
> returning DID_NO_CONNECT.
> 
> However, I think proper return code handling should be something like:
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2018967..21e516e 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
>                         ret = BLK_STS_DEV_RESOURCE;
>                 break;
>         default:
> +               if (unlikely(!scsi_device_online(sdev)))
> +                       scsi_req(req)->result = DID_NO_CONNECT << 16;
> +               else
> +                       scsi_req(req)->result = DID_ERROR << 16;
>                 /*
>                  * Make sure to release all allocated ressources when
>                  * we hit an error, as we will never see this command

The above looks better to me than the original patch.

Thanks,

Bart.

Reply via email to