Hi, >> Root Cause >> - Block layer timeout happens after power off UAS USB device which is >> accessed as reproduce step. During timeout error handler process, scsi host >> state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot >> be released. And in final, usb subsystem hangs up. >> Follow is function call: >> blk_mq_timeout_work >> …->scsi_times_out (… means some functions are not listed before this >> function.) >> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) >> … -> scsi_error_handler >> …-> uas_eh_device_reset_handler >> -> usb_lock_device_for_reset <- take lock >> -> usb_reset_device >> …-> rebind = uas_post_reset (return 1 since ENODEV) >> …-> usb_unbind_and_rebind_marked_interfaces (rebind=1) >> …-> uas_disconnect (scsi_host_set_state to >> SHOST_CANCEL_RECOVERY) >> … -> scsi_queue_rq > >How does scsi_queue_rq get called here? As far as I can see, this shouldn't >happen.
We confirmed the function call path on linux 4.9 when this problem occured since we are working on it. In linux 4.9, the last function is scsi_request_fn instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called by follow path. uas_disconnect |- scsi_remove_host |- scsi_forget_host |- __scsi_remove_device |- device_del |- bus_remove_device |- device_release_driver |- device_release_driver_internal |- __device_release_driver |- drv->remove(dev) (sd_remove) |- sd_shutdown |- sd_sync_cache |- scsi_execute |- __scsi_execute |- blk_execute_rq |- blk_execute_rq_nowait |- blk_mq_sched_insert_request |- blk_mq_run_hw_queue |- __blk_mq_delay_run_hw_queue |- __blk_mq_run_hw_queue |- blk_mq_sched_dispatch_requests |- blk_mq_dispatch_rq_list |- q->mq_ops->queue_rq (scsi_queue_rq) >> Countermeasure >> - Make uas_post_reset doesn’t return 1 when ENODEV returns from >> uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces >> doesn’t need to do unbind/rebind operations in this situation. >> blk_mq_timeout_work >> …->scsi_times_out (… means some functions are not listed before this >> function.) >> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) >> … -> scsi_error_handler >> …-> uas_eh_device_reset_handler (*1) >> -> usb_lock_device_for_reset <- take lock >> -> usb_reset_device >> -> usb_reset_and_verify_device (return ENODEV and FAILED will >> be reported to *1) >> -> uas_post_reset returns 0 when ENODEV => rebind=0 >> -> usb_unbind_and_rebind_marked_interfaces (rebind=0) > >The difference is that uas_disconnect wasn't called here. But that routine >should not cause any problems -- you're always supposed to be able to unbind a >driver from a device. So it looks like this is not the right way to solve the >problem. We confirmed usb_driver_release_interface will call usb_unbind_interface when this problem occurs. So usb_unbind_interface will call driver disconnect callbak. Regards, Kento Kobayashi