On Tue, 2018-12-18 at 12:38 +0530, Kashyap Desai wrote:
> V1 -> V2
> Added fix in __blk_mq_finish_request around blk_mq_put_tag() for
> non-internal tags
> 
> Problem statement :
> Whenever try to get outstanding request via scsi_host_find_tag,
> block layer will return stale entries instead of actual outstanding
> request. Kernel panic if stale entry is inaccessible or memory is reused.
> Fix :
> Undo request mapping in blk_mq_put_driver_tag  nce request is return.
> 
> More detail :
> Whenever each SDEV entry is created, block layer allocate separate tags
> and static requestis.Those requests are not valid after SDEV is deleted
> from the system. On the fly, block layer maps static rqs to rqs as below
> from blk_mq_get_driver_tag()
> 
> data.hctx->tags->rqs[rq->tag] = rq;
> 
> Above mapping is active in-used requests and it is the same mapping which
> is referred in function scsi_host_find_tag().
> After running some IOs, “data.hctx->tags->rqs[rq->tag]” will have some
> entries which will never be reset in block layer.
> 
> There would be a kernel panic, If request pointing to
> “data.hctx->tags->rqs[rq->tag]” is part of “sdev” which is removed
> and as part of that all the memory allocation of request associated with
> that sdev might be reused or inaccessible to the driver.
> Kernel panic snippet -
> 
> BUG: unable to handle kernel paging request at ffffff8000000010
> IP: [<ffffffffc048306c>] mpt3sas_scsih_scsi_lookup_get+0x6c/0xc0 [mpt3sas]
> PGD aa4414067 PUD 0
> Oops: 0000 [#1] SMP
> Call Trace:
>  [<ffffffffc046f72f>] mpt3sas_get_st_from_smid+0x1f/0x60 [mpt3sas]
>  [<ffffffffc047e125>] scsih_shutdown+0x55/0x100 [mpt3sas]

Other block drivers (e.g. ib_srp, skd) do not need this to work reliably.
It has been explained to you that the bug that you reported can be fixed
by modifying the mpt3sas driver. So why to fix this by modifying the block
layer? Additionally, what prevents that a race condition occurs between
the block layer clearing hctx->tags->rqs[rq->tag] and scsi_host_find_tag()
reading that same array element? I'm afraid that this is an attempt to
paper over a real problem instead of fixing the root cause.

Bart.

Reply via email to