On Tue, 2018-12-04 at 22:17 +0530, Kashyap Desai wrote: > + Linux-scsi > > > > diff --git a/block/blk-mq.h b/block/blk-mq.h > > > index 9497b47..57432be 100644 > > > --- a/block/blk-mq.h > > > +++ b/block/blk-mq.h > > > @@ -175,6 +175,7 @@ static inline bool > > > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) > > > static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, > > > struct request *rq) > > > { > > > + hctx->tags->rqs[rq->tag] = NULL; > > > blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > > > rq->tag = -1; > > > > No SCSI driver should call scsi_host_find_tag() after a request has > > finished. The above patch introduces yet another race and hence can't be > > a proper fix. > > Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id to > find out pending IO in firmware. > One of the use case is - HBA firmware recovery. In case of firmware > recovery, driver may require to traverse the list and return back pending > scsi command to SML for retry. > I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic, > mpt3sas are using API scsi_host_find_tag for the same purpose. > > Without this patch, we hit very basic kernel panic due to page fault. This > is not an issue in non-mq code path. Non-mq path use > blk_map_queue_find_tag() and that particular API does not provide stale > requests.
As I wrote before, your patch doesn't fix the race you described but only makes the race window smaller. If you want an example of how to use scsi_host_find_tag() properly, have a look at the SRP initiator driver (drivers/infiniband/ulp/srp). That driver uses scsi_host_find_tag() without triggering any NULL pointer dereferences. The approach used in that driver also works when having to support HBA firmware recovery. Bart.