Hi Sagi Just make some supplement here.
On 02/12/2018 10:16 AM, jianchao.wang wrote: >> I think this is going in the wrong direction. Every state that is needed >> to handle serialization should be done in core ctrl state. Moreover, >> please try to avoid handling this locally in nvme-pci, place common >> helpers in nvme-core and use them. I won't be surprised if fc would >> need some of these. >> >> Also, can we try and not disable the controller from nvme_timeout? > In fact, that is what this patch what to do. For the previous outstanding > requests, > this patch return BLK_EH_NOT_HANDLED and defer the work to nvme_dev_disable. > > I'm >> not sure I understand why is this needed at all. What's wrong with >> scheduling a controller reset/delete? Why is something like >> nvme_pci_disable_ctrl_directly needed? > Keith used to point out to me that, we cannot complete and free a request > before we close the controller and pci master bus, otherwise, there will > be somethings wrong in DMA accessing, because when we complete a request, > the associated DMA mapping will be freed. > > For the previous outstanding requests, this patch could grab them with > blk_abort_request > in nvme_dev_disable, and complete them after we disable/shutdown the > controller. > > But for the adminq requests in nvme_dev_disable and nvme_reset_work, we > cannot do this. > We cannot schedule another reset_work->nvme_dev_disable to do that, because > we are in it. > So I use this nvme_pci_disable_ctrl_directly which looks like very ugly, to > disable the > controller and then we could complete the request with failure to move > progress forward. > >> I'd like to see an effort to consolidate error handling paths rather >> than enhancing the current skew... > Yes, absolutely. That is also what I expect. :) > > This patch has two aspects: > 1. grab all the previous outstanding requests with blk_abort_request. > It is safe when race with the irq completion path. And then complete them > after we disable/shutdown the controller completely. > I think this part could be placed in nvme ctrl core. The 'grab' here is to avoid the timeout path to be triggered during nvme_dev_disable. This is important, because nvme_timeout may issue IOs on adminq or invoke nvme_pci_disable_ctrl_directly which could race with nvme_dev_disable. > 2. avoid nvme_timeout invoke nvme_dev_disable. this is the most tricky part. And also, this will introduce some dangerous scenarios. I have reported some of them before. > as I shared above, we have to _disable_ the controller _before_ we compete > the adminq request > from the nvme_dev_disable and nvme_reset_work. Consequently, we cannot do > as the > nvme_rdma_timeout, schedule a recovery work and then return. Actually, the nvme_timeout have a similar pattern with nvme_rdma_timeout. When adminq/IOq request timeout, we can schedule reset_work for it. But if the requests from the reset_work procedure timeout, we cannot schedule reset_work any more. At the moment, we have to close the controller directly and fail the requests. Looking forward some advice on this. That's really appreciated. Thanks Jianchao