On 09/10/13 07:56, Martin Pieuchot wrote: > On 10/09/13(Tue) 07:15, RD Thrush wrote: >> On 09/10/13 04:42, Martin Pieuchot wrote: >>> [...] >>> >>> Thanks for this detailed bug report. >>> >>> You're saying that you have 2 amd64 systems with the same problem but >>> I see only the dmesg for one machine, does the other has the same ehci >>> controller? >> >> Apparently one is ATI and the other Intel. >> <http://arp.thrush.com/openbsd/ehci_idone/v1/> has two console captures, >> "v1.1" and "v1.2", for the other machine after an ehci_idone hang (I hadn't >> made the panic patch yet). I was able to generate a ddb interrupt to stop >> the spew and gather some additional ddb info. The forementioned directory >> also has acpidump, pcidump, biosdecode, and dmidecode previously collected >> from the same kernel. >> >> If you want/need further info about the 'v1' machine, let me know and I'll >> boot OpenBSD and get the info. > > It would be nice if you could reproduce the manipulation you did with > the other machine and set ehcidebug to 5 before switching your kvm. > >>> The problem you are seeing is related to the way ehci transfers are >>> aborted. The abortion process is subtly broken. >>> >>> For the archives what happens in your case is that the timeout for >>> one of the transfers fires and enqueue an abort task (ehci_timeout >>> in your log). This abort task get scheduled tries to deactivate >>> the qTDs, asks for an Interrupt on Async Advance Doorbell and goes >>> to sleep (ehci_sync_hc in your log). >>> Then the interrupt happens (ehci_intr1: door bell), wakeups the >>> task and goes into the softinterrupt path to process the finished >>> transfers. Here the driver discovers that the transfer that timed >>> out is finished (whoa!) and tries to handles it. But since this >>> transfer has been marked as TIMEOUT (ehci_idone: aborted in your >>> log), it does nothing and bails. >>> >>> Apparently the abort task never get rescheduled and your transfer >>> is never removed from the list, certainly because the hardware >>> keeps interrupting your systems, so you're livelock ;) >>> >>> But all of that happens because a timeout fires for one of your >>> transfers, apparently some ATI controllers needs one more quirk, >>> as your problem looks like a dropped interrupt. Does the diff >>> below helps? >> >> Thank you very much for the detailed analysis and patch. I'll build a >> -current kernel and try it. >> >> Would there be a complementary patch for the (above) Intel ehci controller? > > I'm not even sure this will avoid your problem, a proper fix would be to > stop trying to deactivate the transfer descriptors, as it obviously > doesn't work, and just remove them from the list. Does anybody want to > take the time to do that? :) > > Otherwise you can just buy a non crappy kvm ;)
Does anyone have suggestions for a non-crappy kvm? For the record, this crappy kvm is a Trendnet TK-407.