> From: Bin Liu [mailto:b-...@ti.com] > Hi, > > On Fri, May 06, 2016 at 10:41:46AM +0000, Andrew Goodbody wrote: > > > From: Bin Liu [mailto:b-...@ti.com] > > > On Thu, May 05, 2016 at 04:02:55PM +0000, Andrew Goodbody wrote: > > > > > From: Bin Liu [mailto:b-...@ti.com] On Thu, May 05, 2016 at > > > > > 03:12:00PM +0000, Andrew Goodbody wrote: > > > > > > > From: Bin Liu [mailto:b-...@ti.com] Hi, > > > > > > > > > > > > > > On Thu, May 05, 2016 at 12:22:33PM +0000, Andrew Goodbody > wrote: > > > > > > > > > From: Bin Liu [mailto:b-...@ti.com] Hi, > > > > > > > > > > > > > > > > > > On Wed, May 04, 2016 at 03:48:50PM +0000, Andrew > > > > > > > > > Goodbody > > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I have been investigating communication issues with iPads. > > > > > > > > > > When the system is busy it seems that the musb driver > > > > > > > > > > is silently dropping occasional packets. I have a > > > > > > > > > > usbmon trace that does not show the packet and I have > > > > > > > > > > a trace from a hardware USB analyser that does show > > > > > > > > > > the packet. So the device is correctly sending the > > > > > > > > > > packet, it is even being ACKed, but it is not passed up to > > > > > > > > > > the > application. > > > > > > > > > > The packet is a bulk transfer packet of 20 bytes. Can > > > > > > > > > > anyone please give me pointers to where to go looking > > > > > > > > > > for the problem? The syslog shows > > > > > nothing relevant. > > > > > > > > > > > > > > > > > > What is the part number on the am3352 chip? > > > > > > > > > > > > > > > > AM3352BZCZ100 > > > > > > > > > > > > > > > > > What kernel version do you use? > > > > > > > > > > > > > > > > 4.5.0 > > > > > > > > > > > > > > > > > Is musb cppi dma enabled? If so, does the problem still > > > > > > > > > happen when CPPI disabled? > > > > > > > > > > > > > > > > Yes. Yes. When testing with PIO I did get the message "Rx > > > > > > > > interrupt with no > > > > > > > errors or packet!". > > > > > > > > > > > > > > > > > First try to turn on dynamic debug log in musb_host.c to > > > > > > > > > check if musb receives the packet or not. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > -Bin. > > > > > > > > > > > > > > > > I am having problems doing this. If I enable the whole > > > > > > > > file then I get lots of messages on the console about > > > > > > > > /dev/kmsg buffer overrun. There are more then 26 million > > > > > > > > packets in the hardware trace and I have not worked out > > > > > > > > how to correlate any of the possible message from dynamic > > > > > > > > debug with those packets even when I enable some of the > > > > > > > > dynamic debug lines. I can see a few messages about "DMA > > > > > > > > complete but packet still in FIFO, CSR 2103" and just the > > > > > occasional "extra TX2 ready, csr 2100" > > > > > > > > when I enable some of the lines for dynamic debug. > > > > > > > > > > > > > > Well, this issue would not be easy to debug. Is this with > > > > > > > your custom > > > > > board? > > > > > > > If so, have you run EyeDiagram test to rule out signal > > > > > > > integrity problem? Are you able to reproduce it with any TI > > > > > > > EVM, such as Beaglebone Black? If so, please explain the > > > > > > > detail of the test case, I could try to reproduce it on my side. > > > > > > > > > > > > Yes this is on a custom board and yes we did EyeDiagram tests. > > > > > > Also the ACK from the controller is seen, so that should rule > > > > > > out signal integrity issues. I have just reproduced this on > > > > > > the Beaglebone Black using the latest TI SDK. Do you have > > > > > > access to 16 iPads with lightning connectors and do you have a Mac > running 10.10.x? > > > > > > > > > > No, I don't have those :( > > > > > > > > > > 16 devices connecting to musb sounds too many. what is the ep > > > > > info in the descriptor of the ipad device? > > > > > > > > T: Bus=02 Lev=04 Prnt=06 Port=06 Cnt=07 Dev#= 19 Spd=480 MxCh= 0 > > > > D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 4 > > > > P: Vendor=05ac ProdID=12ab Rev= 3.40 > > > > S: Manufacturer=Apple Inc. > > > > S: Product=iPad > > > > S: SerialNumber=1da5f4610eafb36fa1e9eead80a56cb2db11dfce > > > > C: #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=500mA > > > > I: If#= 0 Alt= 0 #EPs= 3 Cls=06(still) Sub=01 Prot=01 Driver= > > > > E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=1250us > > > > E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=64ms > > > > C: #Ifs= 3 Cfg#= 2 Atr=c0 MxPwr=500mA > > > > I: If#= 0 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=00 Driver= > > > > I: If#= 1 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=00 Driver= > > > > I: If#= 1 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=00 Driver= > > > > E: Ad=81(I) Atr=01(Isoc) MxPS= 192 Ivl=1ms > > > > I: If#= 2 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver= > > > > E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=125us > > > > C:* #Ifs= 2 Cfg#= 3 Atr=c0 MxPwr=500mA > > > > I:* If#= 0 Alt= 0 #EPs= 3 Cls=06(still) Sub=01 Prot=01 > > > > Driver=usbfs > > > > E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=1250us > > > > E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=64ms > > > > I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=fe Prot=02 > > > > Driver=usbfs > > > > E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > C: #Ifs= 3 Cfg#= 4 Atr=c0 MxPwr=500mA > > > > I: If#= 0 Alt= 0 #EPs= 3 Cls=06(still) Sub=01 Prot=01 Driver= > > > > E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=1250us > > > > E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=64ms > > > > I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=fe Prot=02 Driver= > > > > E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > I: If#= 2 Alt= 0 #EPs= 0 Cls=ff(vend.) Sub=fd Prot=01 Driver= > > > > I: If#= 2 Alt= 1 #EPs= 2 Cls=ff(vend.) Sub=fd Prot=01 Driver= > > > > E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > I: If#= 2 Alt= 2 #EPs= 2 Cls=ff(vend.) Sub=fd Prot=01 Driver= > > > > E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > > > > > > > Config #4 is the relevant one. Interface #0 quickly uses up the > > > > controller endpoints and then starts to return -ENOSPC thereafter > > > > for the interrupt URBs from later iPads, but that does not matter > > > > for my use case. Interface #1 is the one that is actually used, > > > > just a bulk in and a bulk out endpoint. So some URBs get opened on > > > > controller endpoints 2-15, but most are just queued up on controller > endpoint 1. > > > > Yes this will keep the controller busy. > > > > > > Well, I am not sure musb is able to support your use case. As you > > > just said, the required endpoints are more than musb provides, so > > > some urbs are queued to ep1, which is not ideal, but the driver is just > designed that way. > > > > What are you not sure about? Why do you think that there could be an > issue? > > I didn't have a use case to investigate this scenario, but think about the > case > that multiple urbs to different devices are queued into a single musb hw ep, > if for whatever reason one urb completion is delayed, other queued urbs will > not be executed, it would cause timeout in the driver to other devices. I > don't know if it would cause other problem though, since I don't have a setup > to look at this scenario. > > > No, the queuing on ep1 is not ideal, but as you say the driver is > > designed that way and so it should either work or else return an > > error. Silently dropping URBs is a bug and is not acceptable behaviour > > in a driver. > > We don't know who drops the packet yet, could be the driver, or could be > the hw too. > > > > > > Honestly, I haven't checked what would happen if some bulk endpoint > > > urbs are queued to ep1. > > > > ! > > Well on the USBMON trace I can see at least 60 active bulk rx URBs in > > use simultaneously for >10s, so the queueing is certainly working, at > > least for the huge majority of URBs. The point when the dropped URB > > occurs there are only around 28 in use. > > > > > > Although I said 16 iPads, this problem will happen with fewer, it > > > > just becomes rarer. > > > > > > How many? Please keep in mind that you probably have 5 hubs > > > connected, which take 5 INT endpoints permanently. > > > > I have not completely characterised this. As I said, it gets rarer > > with fewer and so the test times get longer and longer. There are in > > fact 4 hubs in use but there is also an FTDI USB-serial adapter. The > > iPads themselves have four rx bulk URBs from if#1, and a bulk and int > > rx URBs from if#0, so each iPad can use up 6 controller endpoints, so > > the queueing code is being put into use very early with as little as 2 > > or 3 iPads connected. I have not seen problems with this few iPads, > > but that may be just because I have not been able to test for long > > enough. It all rather depends on whether there is a corner case not > > handled properly that I am triggering reliably just due to the sheer > > volume of USB traffic or else it may be caused by something that is > > CPU load dependent. If this is just a rare corner case then it could > > affect anyone at some point or other. > > > > The pattern that I see is always the same. Each iPad is started > > talking to by the application, one at a time, about 0.5s between each > > start. At the start 4 rx URBs are requested on device endpoint #5, > > then a tx URB is sent to device endpoint 4. The iPad replies to this > > tx URB on one of the rx URBs. I see this rx URB on the wire and the > > ACK from the controller. In the working case there is then lots of > > data transferred over those 4 rx URBs with the occasional prompt from > > a tx URB. In the failing case there is no more traffic on that > > interface as the application never sees the initial reply. I have no > > control over this application, its behaviour cannot be changed. > > > > As I said this is not trivial to debug. I will see what I can do with my > limited > spare time. > > Regards, > -Bin.
I don't know if you have had any time to look at this yet, but thank you for being willing to look at it. I have been doing what I can and have become suspicious of the code to rotate the queue on ep1 after a NAK_TIMEOUT, musb_bulk_nak_timeout. This code is not clearing REQPKT as mentioned in 16.3.8.2.2.1.2 of the TRM and rotates the queue but leaves the reprogramming of the endpoint until later. This seems very racy to me. Tomorrow I will try clearing REQPKT before DATAERR_NAKTIMEOUT to see what happens. Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html