Hi Max That's really appreciated! Here is my test script.
loop_reset_controller.sh #!/bin/bash while true do echo 1 > /sys/block/nvme0n1/device/reset_controller sleep 1 done loop_unbind_driver.sh #!/bin/bash while true do echo "0000:02:00.0" > /sys/bus/pci/drivers/nvme/unbind sleep 2 echo "0000:02:00.0" > /sys/bus/pci/drivers/nvme/bind sleep 2 done loop_io.sh #!/bin/bash file="/dev/nvme0n1" echo $file while true; do if [ -e $file ];then fio fio_job_rand_read.ini else echo "Not found" sleep 1 fi done The fio jobs is as below: size=512m rw=randread bs=4k ioengine=libaio iodepth=64 direct=1 numjobs=16 filename=/dev/nvme0n1 group_reporting I started in sequence, loop_io.sh, loop_reset_controller.sh, loop_unbind_driver.sh. And if lucky, I will get io hang in 3 minutes. ;) Such as: [ 142.858074] nvme nvme0: pci function 0000:02:00.0 [ 144.972256] nvme nvme0: failed to mark controller state 1 [ 144.972289] nvme nvme0: Removing after probe failure status: 0 [ 185.312344] INFO: task bash:1673 blocked for more than 30 seconds. [ 185.312889] Not tainted 4.17.0-rc1+ #6 [ 185.312950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 185.313049] bash D 0 1673 1629 0x00000080 [ 185.313061] Call Trace: [ 185.313083] ? __schedule+0x3de/0xac0 [ 185.313103] schedule+0x3c/0x90 [ 185.313111] blk_mq_freeze_queue_wait+0x44/0x90 [ 185.313123] ? wait_woken+0x90/0x90 [ 185.313133] blk_cleanup_queue+0xe1/0x280 [ 185.313145] nvme_ns_remove+0x1c8/0x260 [ 185.313159] nvme_remove_namespaces+0x7f/0xa0 [ 185.313170] nvme_remove+0x6c/0x130 [ 185.313181] pci_device_remove+0x36/0xb0 [ 185.313193] device_release_driver_internal+0x160/0x230 [ 185.313205] unbind_store+0xfe/0x150 [ 185.313219] kernfs_fop_write+0x114/0x190 [ 185.313234] __vfs_write+0x23/0x150 [ 185.313246] ? rcu_read_lock_sched_held+0x3f/0x70 [ 185.313252] ? preempt_count_sub+0x92/0xd0 [ 185.313259] ? __sb_start_write+0xf8/0x200 [ 185.313271] vfs_write+0xc5/0x1c0 [ 185.313284] ksys_write+0x45/0xa0 [ 185.313298] do_syscall_64+0x5a/0x1a0 [ 185.313308] entry_SYSCALL_64_after_hwframe+0x49/0xbe And get following information in block debugfs: root@will-ThinkCentre-M910s:/sys/kernel/debug/block/nvme0n1# cat hctx6/cpu6/rq_list 000000001192d19b {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, .tag=69, .internal_tag=-1} 00000000c33c8a5b {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, .tag=78, .internal_tag=-1} root@will-ThinkCentre-M910s:/sys/kernel/debug/block/nvme0n1# cat state DYING|BYPASS|NOMERGES|SAME_COMP|NONROT|IO_STAT|DISCARD|NOXMERGES|INIT_DONE|NO_SG_MERGE|POLL|WC|FUA|STATS|QUIESCED We can see there were reqs on ctx rq_list and the request_queue is QUIESCED. Thanks again !! Jianchao On 04/22/2018 10:48 PM, Max Gurtovoy wrote: > > > On 4/22/2018 5:25 PM, jianchao.wang wrote: >> Hi Max >> >> No, I only tested it on PCIe one. >> And sorry for that I didn't state that. > > Please send your exact test steps and we'll run it using RDMA transport. > I also want to run a mini regression on this one since it may effect other > flows. > >> >> Thanks >> Jianchao >> >> On 04/22/2018 10:18 PM, Max Gurtovoy wrote: >>> Hi Jianchao, >>> Since this patch is in the core, have you tested it using some fabrics >>> drives too ? RDMA/FC ? >>> >>> thanks, >>> Max. >>> >>> On 4/22/2018 4:32 PM, jianchao.wang wrote: >>>> Hi keith >>>> >>>> Would you please take a look at this patch. >>>> >>>> This issue could be reproduced easily with a driver bind/unbind loop, >>>> a reset loop and a IO loop at the same time. >>>> >>>> Thanks >>>> Jianchao >>>> >>>> On 04/19/2018 04:29 PM, Jianchao Wang wrote: >>>>> There is race between nvme_remove and nvme_reset_work that can >>>>> lead to io hang. >>>>> >>>>> nvme_remove nvme_reset_work >>>>> -> change state to DELETING >>>>> -> fail to change state to LIVE >>>>> -> nvme_remove_dead_ctrl >>>>> -> nvme_dev_disable >>>>> -> quiesce request_queue >>>>> -> queue remove_work >>>>> -> cancel_work_sync reset_work >>>>> -> nvme_remove_namespaces >>>>> -> splice ctrl->namespaces >>>>> nvme_remove_dead_ctrl_work >>>>> -> nvme_kill_queues >>>>> -> nvme_ns_remove do nothing >>>>> -> blk_cleanup_queue >>>>> -> blk_freeze_queue >>>>> Finally, the request_queue is quiesced state when wait freeze, >>>>> we will get io hang here. >>>>> >>>>> To fix it, unquiesce the request_queue directly before nvme_ns_remove. >>>>> We have spliced the ctrl->namespaces, so nobody could access them >>>>> and quiesce the queue any more. >>>>> >>>>> Signed-off-by: Jianchao Wang <jianchao.w.w...@oracle.com> >>>>> --- >>>>> drivers/nvme/host/core.c | 9 ++++++++- >>>>> 1 file changed, 8 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >>>>> index 9df4f71..0e95082 100644 >>>>> --- a/drivers/nvme/host/core.c >>>>> +++ b/drivers/nvme/host/core.c >>>>> @@ -3249,8 +3249,15 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl) >>>>> list_splice_init(&ctrl->namespaces, &ns_list); >>>>> up_write(&ctrl->namespaces_rwsem); >>>>> - list_for_each_entry_safe(ns, next, &ns_list, list) >>>>> + /* >>>>> + * After splice the namespaces list from the ctrl->namespaces, >>>>> + * nobody could get them anymore, let's unquiesce the request_queue >>>>> + * forcibly to avoid io hang. >>>>> + */ >>>>> + list_for_each_entry_safe(ns, next, &ns_list, list) { >>>>> + blk_mq_unquiesce_queue(ns->queue); >>>>> nvme_ns_remove(ns); >>>>> + } >>>>> } >>>>> EXPORT_SYMBOL_GPL(nvme_remove_namespaces); >>>>> >>>> >>>> _______________________________________________ >>>> Linux-nvme mailing list >>>> linux-n...@lists.infradead.org >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=eQ9q70WFDS-d0s-KndBw8MOJvcBM6wuuKUNklqTC3h8&s=oBasfz9JoJw4yQF4EaWcNfKChZ1HMCkfHVZqyjvYVHQ&e= >>>> >>> >>> _______________________________________________ >>> Linux-nvme mailing list >>> linux-n...@lists.infradead.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=eQ9q70WFDS-d0s-KndBw8MOJvcBM6wuuKUNklqTC3h8&s=oBasfz9JoJw4yQF4EaWcNfKChZ1HMCkfHVZqyjvYVHQ&e= >>>