Hi Keith Thanks for your precious time and kindly response.
On 02/08/2018 11:15 PM, Keith Busch wrote: > On Thu, Feb 08, 2018 at 10:17:00PM +0800, jianchao.wang wrote: >> There is a dangerous scenario which caused by nvme_wait_freeze in >> nvme_reset_work. >> please consider it. >> >> nvme_reset_work >> -> nvme_start_queues >> -> nvme_wait_freeze >> >> if the controller no response, we have to rely on the timeout path. >> there are issues below: >> nvme_dev_disable need to be invoked. >> nvme_dev_disable will quiesce queues, cancel and requeue and outstanding >> requests. >> nvme_reset_work will hang at nvme_wait_freeze > > We used to not requeue timed out commands, so that wasn't a problem > before. Oh well, I'll take a look. > Yes, we indeed don't requeue the timed out commands, but nvme_dev_disable will requeue the other outstanding requests and quiesce the request queues, this will block the nvme_reset_work->nvme_wati_freeze to move forward. As I shared in last email, can we use(or abuse?) blk_set_preempt_only to gate the new bios on generic_make_request ? Freezing queues is good, but wait_freeze in reset_work is a devil. Many thanks Jianchao