Hello, Jianchao. On Tue, Dec 12, 2017 at 06:09:32PM +0800, jianchao.wang wrote: > > @@ -786,18 +779,6 @@ static void blk_mq_rq_timed_out(struct request *req, > > bool reserved) > > const struct blk_mq_ops *ops = req->q->mq_ops; > > enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER; > > > > - /* > > - * We know that complete is set at this point. If STARTED isn't set > > - * anymore, then the request isn't active and the "timeout" should > > - * just be ignored. This can happen due to the bitflag ordering. > > - * Timeout first checks if STARTED is set, and if it is, assumes > > - * the request is active. But if we race with completion, then > > - * both flags will get cleared. So check here again, and ignore > > - * a timeout event with a request that isn't active. > > - */ > > - if (!test_bit(REQ_ATOM_STARTED, &req->atomic_flags)) > > - return; > > - > > if (ops->timeout) > > ret = ops->timeout(req, reserved); > > The BLK_EH_RESET_TIMER case has not been covered here. In that case, > the timer will be re-armed, but the gstate and aborted_gstate are > not updated and still equal with echo other. Consequently, when the > request is completed later, the __blk_mq_complete_request() will be > missed, then the request will expire again. The aborted_gstate > should be updated in the BLK_EH_RESET_TIMER case.
You're right. This is inherently racy tho. Nothing prevented the command from completing before complete was cleared. I'll just clear aborted_gstate which should behave the same way. Thanks. -- tejun