Hi, On Wed, May 10, 2017 at 9:31 AM, Johan Hovold <jo...@kernel.org> wrote: > On Mon, May 08, 2017 at 10:22:08PM +0200, Stefan Wahren wrote: >> >> > Stefan Wahren <stefan.wah...@i2se.com> hat am 25. April 2017 um 20:11 >> > geschrieben: >> > >> > >> > > Stefan Wahren <stefan.wah...@i2se.com> hat am 22. April 2017 um 22:50 >> > > geschrieben: >> > > >> > > >> > > Hi, >> > > >> > > > Eric Anholt <e...@anholt.net> hat am 20. April 2017 um 20:54 >> > > > geschrieben: >> > > > >> > > > >> > > > Stefan Wahren <stefan.wah...@i2se.com> writes: >> > > > >> > > > > Hi, >> > > > > >> > > > >> Doug Anderson <diand...@chromium.org> hat am 18. April 2017 um >> > > > >> 22:41 geschrieben: >> > > > >> >> > > > >> >> > > > >> It's hard to know for sure that all of this time is really in >> > > > >> urb_enqueue(). Possible we could have task switched out and been >> > > > >> blocked elsewhere. Using ftrace to get more fine-grained timings >> > > > >> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are >> > > > >> your >> > > > >> friends here if you want to use trace_printk. >> > > > > >> > > > > i'm a newbie to ftrace, so i hope this would be helpful. >> > > > > >> > > > > # connect PL2303 to the onboard hub >> > > > > # echo 0 > options/sleep-time >> > > > > # echo 0 > function_profile_enabled >> > > > > # echo 1 > function_profile_enabled >> > > > > # ./usb_test >> > > > > # Waiting for at least 20 seconds and then disconnect PL2303 >> > > > > # echo 0 > function_profile_enabled >> > > > > # cat trace_stat/function0 >> > > > > >> > > > > Function Hit Time Avg >> > > > > s^2 >> > > > > -------- --- ---- --- >> > > > > --- >> > > > > bcm2835_handle_irq 361347 219567633 us >> > > > > 607.636 us 1485199 us >> > > > > __handle_domain_irq 1082482 212639551 us >> > > > > 196.437 us 3642030 us >> > > > > generic_handle_irq 1082482 100592051 us >> > > > > 92.927 us 50511334 us >> > > > > irq_exit 1082482 98197771 us >> > > > > 90.715 us 29649040 us >> > > > > handle_level_irq 1082482 95812379 us >> > > > > 88.511 us 51910093 us >> > > > >> > > > If I'm reading this output right, we're spending half of our interrupt >> > > > processing time in irq_exit(), so even if dwc2's interrupt was free >> > > > (the >> > > > generic_handle_irq() chain), we'd be eating about half the CPU getting >> > > > back out of the interrupt handler, right? >> > > > >> > > > I don't really know anything about DWC2 or USB, but is there any way we >> > > > could mitigate the interrupt frequency with this hardware? If nothing >> > > > else, could we loop reading gintsts until it reads back 0? >> > > >> > > first of all i updated my kernel to 4.11rc7 and the issue still >> > > occures. Today i had some time to investigate this issue further >> > > and i made some interesting observations: >> > > >> > > 1. The lockup doesn't occure always after starting usb_test. In >> > > rare cases i was able to run the program without lockup. >> > > 2. In case the lockup occured we are always able to "rescue" the >> > > BCM2835 from this state by sending some serial data to the PL2303. >> > >> > Based on this scenario i patched the interrupt routine to detect the >> > interrupt storm and normal condition. So i can dump the global and >> > host registers in both situations (bad and goodcase). >> > >> > Here is the diff between both register dumps, maybe someone see >> > something interesting: >> > >> > --- badcase.txt 2017-04-25 18:02:59.000000000 +0000 >> > +++ goodcase.txt 2017-04-25 18:02:59.000000000 +0000 >> > @@ -4,12 +4,12 @@ >> > dwc2_dump_global_registers: GAHBCFG @0xCC850008 : 0x00000031 >> > dwc2_dump_global_registers: GUSBCFG @0xCC85000C : 0x20001700 >> > dwc2_dump_global_registers: GRSTCTL @0xCC850010 : 0x80000000 >> > - dwc2_dump_global_registers: GINTSTS @0xCC850014 : 0x06000021 >> > + dwc2_dump_global_registers: GINTSTS @0xCC850014 : 0x06000029 >> > dwc2_dump_global_registers: GINTMSK @0xCC850018 : 0xF300080E >> > - dwc2_dump_global_registers: GRXSTSR @0xCC85001C : 0x00070044 >> > + dwc2_dump_global_registers: GRXSTSR @0xCC85001C : 0x000F0001 >> > dwc2_dump_global_registers: GRXFSIZ @0xCC850024 : 0x00000306 >> > dwc2_dump_global_registers: GNPTXFSIZ @0xCC850028 : 0x01000306 >> > - dwc2_dump_global_registers: GNPTXSTS @0xCC85002C : 0x1D080100 >> > + dwc2_dump_global_registers: GNPTXSTS @0xCC85002C : 0x18080100 >> > ... >> >> @Doug, @John Y., @Johan: Any further ideas to narrow down the reason >> for this interrupt storm? > > No, sorry. > > I suggested earlier that you could try commenting out the interrupt-URB > submission in a naive attempt to narrow it down since that is an obvious > difference from the ftdi driver, but then again so are the frequent ftdi > status bulk messages (and any other timing differences). > > Seems like you need to dig into dwc2.
I haven't had a chance to dig and not sure when I'll have that time. I'll keep this around in my inbox for now on the off chance that I clean it out enough to get back to this. Sorry! -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html