Hi Kristian, Le 29/11/2013 16:41, Kristian Evensen a écrit : > Hello, > > I am currently working on an embedded project based on the Atheros > AR9344 SoC. As a prototype device, we are using the TP-Link TL-WDR4300 > router (http://wiki.openwrt.org/toh/tp-link/tl-wdr4300) and latest > OpenWRT trunk. The kernel is 3.10.18. > > We have over the last couple of weeks experienced a USB problem that > we have not been able to solve. The USB hub works fine most of the > time, but when event X happens, USB becomes unusable for extended > periods of time. We have to disable/enable the power on the USB port > (using GPIO) and then wait until a timeout expires/queue is flushed. > > The devices we have been able to trigger event X with is different > 3G/LTE modems. We have not been able to figure out exactly what > triggers the event, but it happens when we move into areas with poor > or no coverage and then move back into coverage. We see the error both > with QMI-modems (qmi_wwan driver), AT-modems (option_serial driver) > and WebUI-modems (cdc_ether driver). When looking in dmesg after this > event has happened, the following messages appear based on the modem > type: > > QMI: > Thu Nov 21 09:44:53 2013 kern.err kernel: [ 490.600000] qmi_wwan > 1-1.1.2:1.4: nonzero urb status received: -71 > Thu Nov 21 09:44:53 2013 kern.err kernel: [ 490.600000] qmi_wwan > 1-1.1.2:1.4: wdm_int_callback - 0 bytes > > Serial: > [62979.280000] option1 ttyUSB7: option_instat_callback: error -71 > > WebUI: > [ 1192.680000] hub 1-1:1.0: cannot reset port 1 (err = -71) > [ 1192.690000] hub 1-1:1.0: Cannot enable port 1. Maybe the USB cable is bad? > > The common denominator seems to be the -71 error code, which is a > generic Protocol Error if I have understood correctly. When I search > for this error code, it seems that most problems have been due to > power. However, this seems not be the issue here. The modems are > connected to an active hub and event X happens with only a single > modem connected, so it seems unlikely that it is power. > > In order to rule out the TP-Link router, we have also tested with > another router based on the same SoC (Netgear WNDR4300). The same > issue is seen. We also made some tests on a device with a different > SoC (Raspberry Pi, BCM2835) and do not see this issue. > > We have mostly focused on the QMI modems and when using dynamic > debugging, dmesg also contains these errors (repeated many times): > [ 1911.200000] ehci-platform ehci-platform: detected XactErr len 0/1514 retry > 26 > [ 1911.200000] ehci-platform ehci-platform: detected XactErr len 0/64 retry 14 > > Each packet is, as expected, retried 32 times. The data we sent when > these messages appeared was normal TCP traffic, which explains the > packet sizes. If we leave the router alone long enough, it is able to > restart the modems (they disconnect and then connect). However, this > can take many minutes (I guess the packet queue has to be flushed?), > and while this happens the USB hub is blocked (no traffic can pass > through it). > > When running usbmon, we see the following around the time of the crash > (with QMI modem): > > 86abea80 1428742032 S Bi:1:115:7 -150 1514 < > 86abeb00 1428801536 C Bi:1:115:7 0 226 = 024b322c fd930250 f3000000 > 08004500 00d4bba7 4000fd06 08728027 245d2e0f > 86abeb00 1428801554 S Bi:1:115:7 -150 1514 < > 84895c00 1428802518 S Bo:1:115:5 -150 66 = 0250f300 0000024b 322cfd93 > 08004500 00349c42 40003f06 e6772e0f e6768027 > 84895c00 1428802660 C Bo:1:115:5 0 66 > > 86abeb80 1428982112 C Bi:1:115:7 0 1354 = 024b322c fd930250 f3000000 > 08004500 053cbbaa 4000fd06 04078027 245d2e0f > 86abeb80 1428982141 S Bi:1:115:7 -150 1514 < > 86abec00 1429021624 C Bi:1:115:7 0 226 = 024b322c fd930250 f3000000 > 08004500 00d4bbab 4000fd06 086e8027 245d2e0f > 86abec00 1429021653 S Bi:1:115:7 -150 1514 < > 84895480 1429022660 S Bo:1:115:5 -150 66 = 0250f300 0000024b 322cfd93 > 08004500 00349c43 40003f06 e6762e0f e6768027 > 84895480 1429022746 C Bo:1:115:5 0 66 > > 86b1dc00 1430690752 C Ii:1:115:6 0:16 8 = a1010000 04000000 > 86b03d80 1430690765 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 < > 86b1dc00 1430690787 S Ii:1:115:6 -150:16 64 < > 86b03d80 1430691369 C Ci:1:115:0 0 39 = 01260080 03010400 0024001a > 001e0400 9f0c0000 1d0200db 0e110200 01050106 > 86abec80 1430896349 C Bi:1:115:7 -71 0 > 84895800 1431014639 S Bi:1:115:7 -150 1514 < > 86abed00 1431066817 C Bi:1:115:7 -71 0 > 84895480 1431184603 S Bi:1:115:7 -150 1514 < > 86abed80 1431307124 C Bi:1:115:7 -71 0 > 86b03c00 1431330567 S Co:1:115:0 s 21 00 0000 0004 0012 18 = 01110000 > 03010000 01200005 00100200 ff00 > 86b03c00 1431331498 C Co:1:115:0 0 18 > > 86b1dc00 1431332988 C Ii:1:115:6 0:16 8 = a1010000 04000000 > 86b03d80 1431332996 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 < > 86b1dc00 1431333012 S Ii:1:115:6 -150:16 64 < > 86b03d80 1431333484 C Ci:1:115:0 0 58 = 01390080 03010200 0120002d > 00020400 00000000 01020092 05110400 01006e05 > 86b03c00 1431346879 S Co:1:115:0 s 21 00 0000 0004 000d 13 = 010c0000 > 03010000 004d0000 00 > 86b03c00 1431347879 C Co:1:115:0 0 13 > > 86b1dc00 1431348994 C Ii:1:115:6 0:16 8 = a1010000 04000000 > 86b03d80 1431349002 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 < > 86b1dc00 1431349021 S Ii:1:115:6 -150:16 64 < > 86b03d80 1431349490 C Ci:1:115:0 0 98 = 01610080 03010200 004d0055 > 00020400 00000000 12030000 00001303 00020200 > 86b03c00 1431363692 S Co:1:115:0 s 21 00 0000 0004 000d 13 = 010c0000 > 03010000 00250000 00 > 86b03c00 1431367129 C Co:1:115:0 0 13 > > 86b1dc00 1431369000 C Ii:1:115:6 0:16 8 = a1010000 04000000 > 86b03d80 1431369009 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 < > 86b1dc00 1431369029 S Ii:1:115:6 -150:16 64 < > 86b03d80 1431369622 C Ci:1:115:0 0 34 = 01210080 03010200 00250015 > 00020400 00000000 010b00f2 00020006 4e657443 > 84895380 1431424638 S Bi:1:115:7 -150 1514 < > 86abee00 1431533084 C Bi:1:115:7 -71 0 > 84895f80 1431644606 S Bi:1:115:7 -150 1514 < > 86abee80 1431773424 C Bi:1:115:7 -71 0 > 86abef00 1431859709 C Bi:1:115:7 -71 0 > 84895e80 1431884647 S Bi:1:115:7 -150 1514 < > 84895d80 1431884669 S Bi:1:115:7 -150 1514 < > 86abef80 1431891856 C Bi:1:115:7 -71 0 > 86b93e00 1431923867 C Bi:1:115:7 -71 0 > 86b1de00 1431955895 C Bi:1:115:7 -71 0 > 86b1d800 1431986895 C Bi:1:115:7 -71 0 > 84895000 1432004649 S Bi:1:115:7 -150 1514 < > 84895f00 1432004672 S Bi:1:115:7 -150 1514 < > 84895100 1432004690 S Bi:1:115:7 -150 1514 < > 84895980 1432004699 S Bi:1:115:7 -150 1514 < > > My knowledge about USB is very limited, so I am not able to make much > sense of these messages. I have put the full log here: > https://gist.github.com/kristrev/7705450. > > My question is, has anyone experienced anything similar and know how > to solve this problem, or have any ideas on how to proceed? Since the > error seems to be independent of drivers, I guess it points to this > being hardware related. Would for example reducing QH_XACTERR_MAX be a > possible (temporary) solution, or are there any ways to flush this > queue once we see the error? The most critical part for us is that USB > is blocked for such extended periods of time. >
Are your devices and hubs enumerated as full or high-speed? What happens if you turn off the WiFi during this time? I am trying to link you problem with the AR9331 USB stability issues discussed previously in the forum: https://forum.openwrt.org/viewtopic.php?id=39956 -- Michel > Thanks in advance for any help, > Kristian > _______________________________________________ > openwrt-devel mailing list > openwrt-devel@lists.openwrt.org > https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel > _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel