Hi Kristian,

Le 29/11/2013 16:41, Kristian Evensen a écrit :
> Hello,
> 
> I am currently working on an embedded project based on the Atheros
> AR9344 SoC. As a prototype device, we are using the TP-Link TL-WDR4300
> router (http://wiki.openwrt.org/toh/tp-link/tl-wdr4300) and latest
> OpenWRT trunk. The kernel is 3.10.18.
> 
> We have over the last couple of weeks experienced a USB problem that
> we have not been able to solve. The USB hub works fine most of the
> time, but when event X happens, USB becomes unusable for extended
> periods of time. We have to disable/enable the power on the USB port
> (using GPIO) and then wait until a timeout expires/queue is flushed.
> 
> The devices we have been able to trigger event X with is different
> 3G/LTE modems. We have not been able to figure out exactly what
> triggers the event, but it happens when we move into areas with poor
> or no coverage and then move back into coverage. We see the error both
> with QMI-modems (qmi_wwan driver), AT-modems (option_serial driver)
> and WebUI-modems (cdc_ether driver). When looking in dmesg after this
> event has happened, the following messages appear based on the modem
> type:
> 
> QMI:
> Thu Nov 21 09:44:53 2013 kern.err kernel: [  490.600000] qmi_wwan
> 1-1.1.2:1.4: nonzero urb status received: -71
> Thu Nov 21 09:44:53 2013 kern.err kernel: [  490.600000] qmi_wwan
> 1-1.1.2:1.4: wdm_int_callback - 0 bytes
> 
> Serial:
> [62979.280000] option1 ttyUSB7: option_instat_callback: error -71
> 
> WebUI:
> [ 1192.680000] hub 1-1:1.0: cannot reset port 1 (err = -71)
> [ 1192.690000] hub 1-1:1.0: Cannot enable port 1.  Maybe the USB cable is bad?
> 
> The common denominator seems to be the -71 error code, which is a
> generic Protocol Error if I have understood correctly. When I search
> for this error code, it seems that most problems have been due to
> power. However, this seems not be the issue here. The modems are
> connected to an active hub and event X happens with only a single
> modem connected, so it seems unlikely that it is power.
> 
> In order to rule out the TP-Link router, we have also tested with
> another router based on the same SoC (Netgear WNDR4300). The same
> issue is seen. We also made some tests on a device with a different
> SoC (Raspberry Pi, BCM2835) and do not see this issue.
> 
> We have mostly focused on the QMI modems and when using dynamic
> debugging, dmesg also contains these errors (repeated many times):
> [ 1911.200000] ehci-platform ehci-platform: detected XactErr len 0/1514 retry 
> 26
> [ 1911.200000] ehci-platform ehci-platform: detected XactErr len 0/64 retry 14
> 
> Each packet is, as expected, retried 32 times. The data we sent when
> these messages appeared was normal TCP traffic, which explains the
> packet sizes. If we leave the router alone long enough, it is able to
> restart the modems (they disconnect and then connect). However, this
> can take many minutes (I guess the packet queue has to be flushed?),
> and while this happens the USB hub is blocked (no traffic can pass
> through it).
> 
> When running usbmon, we see the following around the time of the crash
> (with QMI modem):
> 
> 86abea80 1428742032 S Bi:1:115:7 -150 1514 <
> 86abeb00 1428801536 C Bi:1:115:7 0 226 = 024b322c fd930250 f3000000
> 08004500 00d4bba7 4000fd06 08728027 245d2e0f
> 86abeb00 1428801554 S Bi:1:115:7 -150 1514 <
> 84895c00 1428802518 S Bo:1:115:5 -150 66 = 0250f300 0000024b 322cfd93
> 08004500 00349c42 40003f06 e6772e0f e6768027
> 84895c00 1428802660 C Bo:1:115:5 0 66 >
> 86abeb80 1428982112 C Bi:1:115:7 0 1354 = 024b322c fd930250 f3000000
> 08004500 053cbbaa 4000fd06 04078027 245d2e0f
> 86abeb80 1428982141 S Bi:1:115:7 -150 1514 <
> 86abec00 1429021624 C Bi:1:115:7 0 226 = 024b322c fd930250 f3000000
> 08004500 00d4bbab 4000fd06 086e8027 245d2e0f
> 86abec00 1429021653 S Bi:1:115:7 -150 1514 <
> 84895480 1429022660 S Bo:1:115:5 -150 66 = 0250f300 0000024b 322cfd93
> 08004500 00349c43 40003f06 e6762e0f e6768027
> 84895480 1429022746 C Bo:1:115:5 0 66 >
> 86b1dc00 1430690752 C Ii:1:115:6 0:16 8 = a1010000 04000000
> 86b03d80 1430690765 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
> 86b1dc00 1430690787 S Ii:1:115:6 -150:16 64 <
> 86b03d80 1430691369 C Ci:1:115:0 0 39 = 01260080 03010400 0024001a
> 001e0400 9f0c0000 1d0200db 0e110200 01050106
> 86abec80 1430896349 C Bi:1:115:7 -71 0
> 84895800 1431014639 S Bi:1:115:7 -150 1514 <
> 86abed00 1431066817 C Bi:1:115:7 -71 0
> 84895480 1431184603 S Bi:1:115:7 -150 1514 <
> 86abed80 1431307124 C Bi:1:115:7 -71 0
> 86b03c00 1431330567 S Co:1:115:0 s 21 00 0000 0004 0012 18 = 01110000
> 03010000 01200005 00100200 ff00
> 86b03c00 1431331498 C Co:1:115:0 0 18 >
> 86b1dc00 1431332988 C Ii:1:115:6 0:16 8 = a1010000 04000000
> 86b03d80 1431332996 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
> 86b1dc00 1431333012 S Ii:1:115:6 -150:16 64 <
> 86b03d80 1431333484 C Ci:1:115:0 0 58 = 01390080 03010200 0120002d
> 00020400 00000000 01020092 05110400 01006e05
> 86b03c00 1431346879 S Co:1:115:0 s 21 00 0000 0004 000d 13 = 010c0000
> 03010000 004d0000 00
> 86b03c00 1431347879 C Co:1:115:0 0 13 >
> 86b1dc00 1431348994 C Ii:1:115:6 0:16 8 = a1010000 04000000
> 86b03d80 1431349002 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
> 86b1dc00 1431349021 S Ii:1:115:6 -150:16 64 <
> 86b03d80 1431349490 C Ci:1:115:0 0 98 = 01610080 03010200 004d0055
> 00020400 00000000 12030000 00001303 00020200
> 86b03c00 1431363692 S Co:1:115:0 s 21 00 0000 0004 000d 13 = 010c0000
> 03010000 00250000 00
> 86b03c00 1431367129 C Co:1:115:0 0 13 >
> 86b1dc00 1431369000 C Ii:1:115:6 0:16 8 = a1010000 04000000
> 86b03d80 1431369009 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
> 86b1dc00 1431369029 S Ii:1:115:6 -150:16 64 <
> 86b03d80 1431369622 C Ci:1:115:0 0 34 = 01210080 03010200 00250015
> 00020400 00000000 010b00f2 00020006 4e657443
> 84895380 1431424638 S Bi:1:115:7 -150 1514 <
> 86abee00 1431533084 C Bi:1:115:7 -71 0
> 84895f80 1431644606 S Bi:1:115:7 -150 1514 <
> 86abee80 1431773424 C Bi:1:115:7 -71 0
> 86abef00 1431859709 C Bi:1:115:7 -71 0
> 84895e80 1431884647 S Bi:1:115:7 -150 1514 <
> 84895d80 1431884669 S Bi:1:115:7 -150 1514 <
> 86abef80 1431891856 C Bi:1:115:7 -71 0
> 86b93e00 1431923867 C Bi:1:115:7 -71 0
> 86b1de00 1431955895 C Bi:1:115:7 -71 0
> 86b1d800 1431986895 C Bi:1:115:7 -71 0
> 84895000 1432004649 S Bi:1:115:7 -150 1514 <
> 84895f00 1432004672 S Bi:1:115:7 -150 1514 <
> 84895100 1432004690 S Bi:1:115:7 -150 1514 <
> 84895980 1432004699 S Bi:1:115:7 -150 1514 <
> 
> My knowledge about USB is very limited, so I am not able to make much
> sense of these messages. I have put the full log here:
> https://gist.github.com/kristrev/7705450.
> 
> My question is, has anyone experienced anything similar and know how
> to solve this problem, or have any ideas on how to proceed? Since the
> error seems to be independent of drivers, I guess it points to this
> being hardware related. Would for example reducing QH_XACTERR_MAX be a
> possible (temporary) solution, or are there any ways to flush this
> queue once we see the error? The most critical part for us is that USB
> is blocked for such extended periods of time.
> 

Are your devices and hubs enumerated as full or high-speed?

What happens if you turn off the WiFi during this time?

I am trying to link you problem with the AR9331 USB stability issues discussed 
previously in the forum:
https://forum.openwrt.org/viewtopic.php?id=39956

-- 
Michel
> Thanks in advance for any help,
> Kristian
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
> 
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel

Reply via email to