Re: Help troubleshooting ehci_idone hang.

RD Thrush Thu, 12 Sep 2013 05:18:08 -0700

On 09/12/13 05:15, Martin Pieuchot wrote:
> On 11/09/13(Wed) 11:03, RD Thrush wrote:
>> On 09/10/13 07:56, Martin Pieuchot wrote:
>>> On 10/09/13(Tue) 07:15, RD Thrush wrote:
>>>> On 09/10/13 04:42, Martin Pieuchot wrote:
>>>>> [...]
>>>>>
>>>>> Thanks for this detailed bug report.
>>>>>
>>>>> You're saying that you have 2 amd64 systems with the same problem but
>>>>> I see only the dmesg for one machine, does the other has the same ehci
>>>>> controller?
>>>>
>>>> Apparently one is ATI and the other Intel.  
>>>> <http://arp.thrush.com/openbsd/ehci_idone/v1/> has two console captures, 
>>>> "v1.1" and "v1.2", for the other machine after an ehci_idone hang (I 
>>>> hadn't made the panic patch yet).  I was able to generate a ddb interrupt 
>>>> to stop the spew and gather some additional ddb info.  The forementioned 
>>>> directory also has acpidump, pcidump, biosdecode, and dmidecode previously 
>>>> collected from the same kernel.
>>>>
>>>> If you want/need further info about the 'v1' machine, let me know and I'll 
>>>> boot OpenBSD and get the info.
>>>
>>> It would be nice if you could reproduce the manipulation you did with
>>> the other machine and set ehcidebug to 5 before switching your kvm.
>>
>> With ehcidebug=5 on the v1 machine, switching the kvm resulted in a continual
>> ddb loop,  I wasn't able to generate a ddb interrupt via the serial console;
>> however, the pc keyboard was able to drop into ddb where I collected some
>> additional info.
>>
>> 'boot sync' resulted in the panic I'd patched (earlier in thread) to stop the
>> initial hang.  I had to do a hard reset to regain control.
>>
>> <http://arp.thrush.com/openbsd/ehci_idone/v1/v1-2> has the capture of the 
>> serial
>> console for that session.
> 
> Could you try the diff below on the v1 machine and tell me if it helps?


Thanks, I don't think it helped...

After booting the new kernel, upon first use of the kvm switch, the serial
console began filling with diagnostics which (to my untrained eye) looked
similar to the pre-patch session. v1-2.  Since I was unable to remotely ssh, I
interrupted ddb and set ehcidebug=1 but was unable to remotely ssh.  ehcidebug=0
allowed remote ssh and normal operations resumed.

<http://arp.thrush.com/openbsd/ehci_idone/v1/v1-3> has the associated serial
console.

Let me know if I can provide further info.

>> WRT to the other machine, x4, I installed the patch and have not yet had a
>> problem.  However, with ehcidebug=5, the following 2 line message is issued
>> about once per second:
>> ehci_intrlist_timeout
>> ehci_check_intr: ex=0xffff800000238c00
> 
> That's expected, thanks for confirming the problem cannot be reproduced
> with this diff with an ATI controller.

Since the problem has been intermittent since Nov 2012, I'm not sure enough
time/kvm usage has occurred for certainty about that fix.

I will run a custom MP kernel with your pci/ehci_pci.c patch on the x4 machine
whenever I install a new snapshot and let you know if I notice the ehci_idone 
issue.


> Index: usbdi.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/dev/usb/usbdi.c,v
> retrieving revision 1.58
> diff -u -p -r1.58 usbdi.c
> --- usbdi.c   6 Sep 2013 08:29:58 -0000       1.58
> +++ usbdi.c   12 Sep 2013 08:47:09 -0000
> @@ -810,6 +810,8 @@ usb_transfer_complete(struct usbd_xfer *
>       if (!repeat) {
>               /* XXX should we stop the queue on all errors? */
>               if ((xfer->status == USBD_CANCELLED ||
> +                  xfer->status == USBD_IOERROR ||
> +                  xfer->status == USBD_STALLED ||
>                    xfer->status == USBD_TIMEOUT) &&
>                   pipe->iface != NULL)                /* not control pipe */
>                       pipe->running = 0;

Re: Help troubleshooting ehci_idone hang.

Reply via email to