Hello,

I am encountering an issue where noise on USB devices is causing the host 
ci_hdrc driver to stall.  The system contains an i.MX6 board (UDOO) connected 
to a USB touchscreen, SMSC95xx hub, an FTDI device, and a hi-speed camera.

Occasionally (after hours or days), or in a noisy environment, all the devices 
on the root hub stop working.  They show up in debugfs, lsusb, etc, but any 
attempt to communicate with them or reset through /sys/bus/usb times out with 
error -110 or -71.

dmesg, ci_hdrc debugfs entries, and lsusb -v are posted here:
https://gist.github.com/cjgriscom/5238df9fbf7ffc4f558b37b5883f8398

Performing a bind/unbind on ci_hdrc with the following commands results in a 
successful reset:
 # echo "ci_hdrc.0" > /sys/bus/platform/drivers/ci_hdrc/unbind
 # echo "ci_hdrc.0" > /sys/bus/platform/drivers/ci_hdrc/bind

The issue seems to strongly correlate with a large error count in the IRQ 
counter in /sys/kernel/debug/usb/ehci/ci_hdrc.0/registers, whereas under normal 
operation the count is very low:
  irq normal 1031800 err 199069 iaa 17040 (lost 0)
After the lockup, interrupts appear to stop firing as the count stops 
incrementing.

I have not yet found a way to reproduce the error outside of the machine where 
it occurs.  Swapping hardware has not made a difference.  I have tried 
artificially inducing bit errors by manipulating the data lines of one of the 
attached USB ports, and while this creates a large number of errors, the bus is 
able to recover once it returns to normal operation.  The most reliable way 
that I have used to reproduce the failure locally is to run a welder nearby, 
and the driver usually fails within minutes.

I have seen the failure occur on the following kernels:
3.14
4.15.7
4.18.20
4.20.6
5.0-r7

Similar reports:
This old bug report at NXP seems to describe the same issue: 
https://community.nxp.com/thread/355151
A similar issue seems to have been fixed in the dwc_otg driver: 
https://github.com/raspberrypi/linux/issues/552

Help and pointers as to how to get better logs and debug info would be useful.  
I'm able to recompile the kernel and test as needed on my end.

Thanks,
Chandler Griscom

Reply via email to