On 07.09.2016 17:53, Alan Stern wrote:
On Wed, 7 Sep 2016, Mathias Nyman wrote:

I'm still seeing occasional problems. For example, when I unplugged the dock 
last night, it seems to have wedged some things, and then plugging it back in 
didn't work. See some logs below.


I ran a show-blocked-tasks after plugging the dock back in:


Looks like there is the usb_hub_wq that tries to handle the disconnect event
at the same time as the pci remove code is removing xhci hosts (and connected 
devices)

Sep  7 09:03:30 fred kernel: [83879.383356] Workqueue: usb_hub_wq hub_event
Sep  7 09:03:30 fred kernel: [83879.383395] Call Trace:
Sep  7 09:03:30 fred kernel: [83879.383416]  [<ffffffff81855fa5>] 
schedule+0x35/0x80
Sep  7 09:03:30 fred kernel: [83879.383427]  [<ffffffff8163433d>] 
usb_kill_urb+0x8d/0xc0
Sep  7 09:03:30 fred kernel: [83879.383444]  [<ffffffff810c4490>] ? 
wake_atomic_t_function+0x60/0x60
Sep  7 09:03:30 fred kernel: [83879.383454]  [<ffffffff81633076>] 
usb_hcd_flush_endpoint+0x126/0x190
Sep  7 09:03:30 fred kernel: [83879.383465]  [<ffffffff81635fbb>] 
usb_disable_endpoint+0x9b/0xb0


Sep  7 09:03:30 fred kernel: [83879.383686] Workqueue: kacpi_hotplug 
acpi_hotplug_work_fn
Sep  7 09:03:30 fred kernel: [83879.383717] Call Trace:
Sep  7 09:03:30 fred kernel: [83879.383728]  [<ffffffff81855fa5>] 
schedule+0x35/0x80
Sep  7 09:03:30 fred kernel: [83879.383738]  [<ffffffff8185624e>] 
schedule_preempt_disabled+0xe/0x10
Sep  7 09:03:30 fred kernel: [83879.383748]  [<ffffffff81857ea9>] 
__mutex_lock_slowpath+0xb9/0x130
Sep  7 09:03:30 fred kernel: [83879.383758]  [<ffffffff81857f3f>] 
mutex_lock+0x1f/0x30
Sep  7 09:03:30 fred kernel: [83879.383766]  [<ffffffff8162b951>] 
usb_disconnect+0x51/0x280
Sep  7 09:03:30 fred kernel: [83879.383776]  [<ffffffff816314f0>] 
usb_remove_hcd+0xd0/0x240

First guess would be there is something wrong with killing the urb.
usb_hub_wq takes the roothub device lock first, and then ends up waiting for 
usb_kill_urb forever.

I agree.  Probably xhci-hcd is waiting for the controller to do
something before it will give back the cancelled URB.  But since the
controller has been removed, it never does anything.

This would block the pci remove path when usb_remove_hcd calls usb_disconnect, 
which
tries to take the roothub lock as well.

Doing a usbfs read on a usb device also takes the roothub device lock, which 
could explain
why lsusb is blocked.

Just an idea, need to check the code in more detail to see if it's a possible 
cause

ehci-hcd includes checks in several places for ehci->rh_state ==
RH_STATE_RUNNING.  The removal pathway sets ehci->rh_state to
RH_STATE_HALTED.  As a result, the driver avoids waiting for things
that will never happen.


Yes, seems that there are two things that need to be done for xhci here.

First part is doing the similar thing to xhci_urb_dequeue as ehci does, make 
sure
host is alive before queuing any stop endpoint commands. It does check if PCI 
reads return
0xffffffff or host is XHCI_STATE_DYING, but we could detect a remove a lot 
earlier.
Second part is to make sure that the canceled URB is given back if the stop endpoint command
times out.
Currently the xhci_stop_endpoint_command_watchdog() function may return without
giving back canceled urbs, causing usb_kill_urb() to wait on 
wait_event(usb_kill_urb_queue, ..) forever with
locks held, blocking the pci remove thread.

I'll start writing a patch

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to