On 09/10/13 07:56, Martin Pieuchot wrote:
> On 10/09/13(Tue) 07:15, RD Thrush wrote:
>> On 09/10/13 04:42, Martin Pieuchot wrote:
>>> [...]
>>>
>>> Thanks for this detailed bug report.
>>>
>>> You're saying that you have 2 amd64 systems with the same problem but
>>> I see only the dmesg for one machine, does the other has the same ehci
>>> controller?
>>
>> Apparently one is ATI and the other Intel.  
>> <http://arp.thrush.com/openbsd/ehci_idone/v1/> has two console captures, 
>> "v1.1" and "v1.2", for the other machine after an ehci_idone hang (I hadn't 
>> made the panic patch yet).  I was able to generate a ddb interrupt to stop 
>> the spew and gather some additional ddb info.  The forementioned directory 
>> also has acpidump, pcidump, biosdecode, and dmidecode previously collected 
>> from the same kernel.
>>
>> If you want/need further info about the 'v1' machine, let me know and I'll 
>> boot OpenBSD and get the info.
> 
> It would be nice if you could reproduce the manipulation you did with
> the other machine and set ehcidebug to 5 before switching your kvm.
> 
>>> The problem you are seeing is related to the way ehci transfers are
>>> aborted.  The abortion process is subtly broken.
>>>
>>> For the archives what happens in your case is that the timeout for
>>> one of the transfers fires and enqueue an abort task (ehci_timeout
>>> in your log).  This abort task get scheduled tries to deactivate
>>> the qTDs, asks for an Interrupt on Async Advance Doorbell and goes
>>> to sleep (ehci_sync_hc in your log).
>>> Then the interrupt happens (ehci_intr1: door bell), wakeups the
>>> task and goes into the softinterrupt path to process the finished
>>> transfers.  Here the driver discovers that the transfer that timed
>>> out is finished (whoa!) and tries to handles it.  But since this 
>>> transfer has been marked as TIMEOUT (ehci_idone: aborted in your
>>> log), it does nothing and bails.  
>>>
>>> Apparently the abort task never get rescheduled and your transfer
>>> is never removed from the list, certainly because the hardware 
>>> keeps interrupting your systems, so you're livelock ;)
>>>
>>> But all of that happens because a timeout fires for one of your
>>> transfers, apparently some ATI controllers needs one more quirk,
>>> as your problem looks like a dropped interrupt.  Does the diff
>>> below helps?
>>
>> Thank you very much for the detailed analysis and patch.  I'll build a 
>> -current kernel and try it.
>>
>> Would there be a complementary patch for the (above) Intel ehci controller?
> 
> I'm not even sure this will avoid your problem, a proper fix would be to
> stop trying to deactivate the transfer descriptors, as it obviously
> doesn't work, and just remove them from the list.  Does anybody want to
> take the time to do that? :)
> 
> Otherwise you can just buy a non crappy kvm ;)

Does anyone have suggestions for a non-crappy kvm?

For the record, this crappy kvm is a Trendnet TK-407.

Reply via email to