Hi Keith Sorry for bothering you again.
On 02/07/2018 10:03 AM, jianchao.wang wrote: > Hi Keith > > Thanks for your time and kindly response on this. > > On 02/06/2018 11:13 PM, Keith Busch wrote: >> On Tue, Feb 06, 2018 at 09:46:36AM +0800, jianchao.wang wrote: >>> Hi Keith >>> >>> Thanks for your kindly response. >>> >>> On 02/05/2018 11:13 PM, Keith Busch wrote: >>>> but how many requests are you letting enter to their demise by >>>> freezing on the wrong side of the reset? >>> >>> There are only two difference with this patch from the original one. >>> 1. Don't freeze the queue for the reset case. At the moment, the >>> outstanding requests will be requeued back to blk-mq queues. >>> The new entered requests during reset will also stay in blk-mq queues. >>> All this requests will not enter into nvme driver layer >>> due to quiescent request_queues. And they will be issued after the reset >>> is completed successfully. >>> 2. Drain the request queue before nvme_dev_disable. This is nearly same >>> with the previous rule which will also unquiesce the queue >>> and let the requests be able to be drained. The only difference is this >>> patch will invoke wait_freeze in nvme_dev_disable instead >>> of nvme_reset_work. >>> >>> We don't sacrifice any request. This patch do the same thing with the >>> previous one and make things clearer. >> >> No, what you're proposing is quite different. What's the difference ? Can you please point out. I have shared my understanding below. But actually, I don't get the point what's the difference you said. Or what you refer to is about the 4th patch ? If yes, I also explain this below. Really appreciate your precious time to explain this. :) Many thanks Jianchao >> >> By "enter", I'm referring to blk_queue_enter. > When a request is allocated, it will hold a request_queue->q_usage_counter > until it is freed. > Please refer to > blk_mq_get_request -> blk_queue_enter_live > blk_mq_free_request -> blk_exit_queue > > Regarding to 'request enters into an hctx', I cannot get the point. > I think you should mean it enter into nvme driver layer. > >> Once a request enters >> into an hctx, it can not be backed out to re-enter a new hctx if the >> original one is invalidated. > > I also cannot get the point here. We certainly will not issue a request which > has been issued to other hctx. > What this patch and also the original one does is that disable/shutdown > controller, > cancel and requeue or fail the outstanding requests. > > The requeue mechanism will ensure the requests to be inserted to the ctx > where req->mq_ctx->cpu > points to. > >> >> Prior to a reset, all requests that have entered the queue are committed >> to that hctx, > > A request could be on > - blk-mq per-cpu ctx->rq_list, IO scheduler list\ > - hctx->dispatch list or > - request_queue->requeue_list (will be inserted to 1st case again) > > When requests are issued, they will be dequeued from 1st or 2nd case and > submitted to nvme driver layer. > These requests are _outstanding_ ones. > > When the request queue is quiesced, the request will be stayed in blk-mq > per-cpu ctx->rq_list, IO scheduler list > or hctx->dispatch list, and cannot be issued to driver layer any more. > When the request queue is frozen, it will gate the bio out of > generic_make_request, so new request cannot enter > blk-mq layer any more, and certainly the nvme driver layer. > > For the reset case, the nvme controller will be back soon, we needn't freeze > the queue, just quiescing is enough. > The outstanding ones will be canceled and _requeued_ to > request_queue->requeue_list, then they will be inserted into > blk-mq layer again by requeue_work. When reset_work completes and start > queues again, all the requests will be > issued again. :) > > For the shutdown case, freezing and quiescing is safer. Also we will wait > them to be completed if the controller is still > alive. If dead, we need to fail them directly instead of requeue them, > otherwise, IO hung will come up, because controller > will be offline for some time. > > > and we can't do anything about that. The only thing we can >> do is prevent new requests from entering until we're sure that hctx is >> valid on the other side of the reset. >> > yes, that's is what this patch does. > > Add some explaining about the 4th patch nvme-pci: break up nvme_timeout and > nvme_dev_disable here. > Also thanks for your time to look into this. That's really appreciated! > > The key point is blk_abort_request. It will force the request to be expired > and then handle the request > in timeout work context. It is safe to race with the irq completion path. > This is the most important reason > to use blk_abort_request. > We don't _complete_ the request or _rearm_ the time, but just set a CANCELED > flag. So request will not be freed. > Then these requests cannot be taken away by irq completion path and time out > path is also be avoid (no outstanding requests any more). > So we say 'all the outstanding requests are grabbed'. When we close the > controller totally, we could complete > these requests safely. This is the core idea of the 4th patch. > > Many thanks > Jianchao > >