On Tue, 2018-12-04 at 22:17 +0530, Kashyap Desai wrote:
> + Linux-scsi
> 
> > > diff --git a/block/blk-mq.h b/block/blk-mq.h
> > > index 9497b47..57432be 100644
> > > --- a/block/blk-mq.h
> > > +++ b/block/blk-mq.h
> > > @@ -175,6 +175,7 @@ static inline bool
> > > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
> > >   static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
> > >                          struct request *rq)
> > >   {
> > > +    hctx->tags->rqs[rq->tag] = NULL;
> > >       blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
> > >       rq->tag = -1;
> > 
> > No SCSI driver should call scsi_host_find_tag() after a request has
> > finished. The above patch introduces yet another race and hence can't be
> > a proper fix.
> 
> Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id to
> find out pending IO in firmware.
> One of the use case is -  HBA firmware recovery.  In case of firmware
> recovery, driver may require to traverse the list and return back pending
> scsi command to SML for retry.
> I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic,
> mpt3sas are using API scsi_host_find_tag for the same purpose.
> 
> Without this patch, we hit very basic kernel panic due to page fault.  This
> is not an issue in non-mq code path. Non-mq path use
> blk_map_queue_find_tag() and that particular API does not provide stale
> requests.

As I wrote before, your patch doesn't fix the race you described but only
makes the race window smaller. If you want an example of how to use
scsi_host_find_tag() properly, have a look at the SRP initiator driver
(drivers/infiniband/ulp/srp). That driver uses scsi_host_find_tag() without
triggering any NULL pointer dereferences. The approach used in that driver
also works when having to support HBA firmware recovery.

Bart.

Reply via email to