Usb wifi stick doesn´t work until I reboot the system with kernel 4.20

2019-03-11 Thread Bernhard Gebetsberger

Hi,

After upgrading from the 4.19 kernel to 4.20 my usb wifi stick doesn´t 
work until I reboot the system(the problem occurs when the system was 
shut down).

Dmesg shows the following error when the issue occurs:

xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or 
ep state.

I´ve already tried disabling iommu and I also tried reverting all 
rt2800usb commits that were made between 4.19 and 4.20, but this didn´t 
solve the issue, so it´s probably usb/xhci related.
I´ve attached two dmesg logs(one before rebooting,  one after rebooting) 
here: https://bugzilla.kernel.org/show_bug.cgi?id=202541


Any idea how to fix this?

Thanks in advance,

Bernhard


Regression xhci_hcd cmd failed due to incorrect slot or ep state

2019-04-07 Thread Bernhard Gebetsberger

When using  the kernel version 4.20 or above my usb wifi stick doesn't
work until I reboot my system, dmesg shows this message when the issue
occurs:

xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or 
ep state

I've found that the problem is caused by this commit:

commit f8f80be501aa2f10669585c3e328fad079d8cb3a
Author: Mathias Nyman mailto:mathias.ny...@linux.intel.com>>
Date:   Thu Sep 20 19:13:37 2018 +0300

xhci: Use soft retry to recover faster from transaction errors

It looks like it only affects wifi sticks(various models) on specific
hardware configurations(I have this issue on my Ryzen pc, my Intel pc
isn´t affected, but someone in the kernel bug tracker mentioned that he
has this issue on an Intel notebook as well (only with USB 3.0 ports
though)).


Regression xhci_hcd: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state

2019-07-22 Thread Bernhard Gebetsberger
Hi Mathias,

around 1.5 weeks ago I've sent the dmesg log and xhci_hcd tracing file to you. 
Is there anything else that needs to be provided? How should we precede 
otherwise?

The problem occurs since kernel version 4.20 and it looks like more and more 
people are affected by this, most of them blame their wifi driver for it. Maybe 
it would be the best to just revert the patch that is causing the problem?

The regression is caused by the changes in process_bulk_intr_td(), it's part of 
this commit:

commit f8f80be501aa2f10669585c3e328fad079d8cb3a
Author: Mathias Nyman mailto:mathias.ny...@linux.intel.com>>
Date:   Thu Sep 20 19:13:37 2018 +0300

xhci: Use soft retry to recover faster from transaction errors

In case you missed the mail with the log files, I've uploaded them on 
transfer.sh: https://transfer.sh/KDEeE/dmesg and 
https://transfer.sh/14Imam/trace

- Bernhard




Regression: USB/xhci issues on some systems with newer kernel versions

2019-10-02 Thread Bernhard Gebetsberger
Hi,

There has been a regression in the xhci driver since kernel version 4.20, on 
some systems some usb devices won't work until the system gets rebooted.
The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to incorrect 
slot or ep state", although for some reason there are some usb devices that are 
affected by this issue but don't throw the error message(including the device 
I'm using, I got the error in previous kernel versions though).
It seems like this bug can also lead to system instability, one user reported 
in the bug tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that 
he got a system freeze because of this when using kernel 5.3.1.

When looking at the responses in the bug tracker, it looks like it mostly 
affects Ryzen based systems with 300 series motherboards, although there are 
some other affected systems as well. It doesn't only affect wifi/bluetooth 
sticks, some users even got this issue when connecting their smartphone or 
their external hard drive to their PC.

After enabling kernel debugging/tracing for xhci_hcd I got the following 
messages in dmesg(short version, link to the whole file below):
[  231.185635] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 0, reg 
0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x372
[  231.185642] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 1, reg 
0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x373
[  231.185646] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 2, reg 
0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x374
..
[  231.887681] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 4, reg 
0x3119 start_frame_id 0x624, end_frame_id 0x1a2, start_frame 0x633
[  231.887687] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 5, reg 
0x3119 start_frame_id 0x624, end_frame_id 0x1a2, start_frame 0x634
[  231.892346] xhci_hcd :38:00.4: Cancel URB 8599ca58, dev 1, ep 
0x1, starting at offset 0xff388ea0
[  231.892355] xhci_hcd :38:00.4: // Ding dong!
[  231.892363] xhci_hcd :38:00.4: Cancel URB 0d35fd5d, dev 1, ep 
0x1, starting at offset 0xff388ef0
[  231.892368] xhci_hcd :38:00.4: Cancel URB 74e3ee88, dev 1, ep 
0x1, starting at offset 0xff388e40
[  231.892640] xhci_hcd :38:00.4: Stopped on Transfer TRB for slot 1 ep 1
[  231.892647] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388ea0 (dma).
[  231.892651] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388eb0 (dma).
[  231.892653] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388ec0 (dma).
[  231.892656] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388ed0 (dma).
[  231.892658] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388ee0 (dma).
[  231.892661] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388ef0 (dma).
[  231.892663] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388f00 (dma).
[  231.892666] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388f10 (dma).
[  231.892668] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388f20 (dma).
[  231.892670] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388f30 (dma).
[  231.892672] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388f40 (dma).
[  231.892675] xhci_hcd :38:00.4: Removing canceled TD starting at 
0xff388e90 (dma).
[  231.892677] xhci_hcd :38:00.4: Finding endpoint context
[  231.892679] xhci_hcd :38:00.4: Cycle state = 0x1
[  231.892682] xhci_hcd :38:00.4: New dequeue segment = 5d174923 
(virtual)
[  231.892685] xhci_hcd :38:00.4: New dequeue pointer = 0xff388ea0 (DMA)
[  231.892688] xhci_hcd :38:00.4: Set TR Deq Ptr cmd, new deq seg = 
5d174923 (0xff388000 dma), new deq ptr = d5c5ed2a (0xff388ea0 
dma), new cycle = 1
[  231.892693] xhci_hcd :38:00.4: // Ding dong!
[  231.892728] xhci_hcd :38:00.4: Successful Set TR Deq Ptr cmd, deq = 
@ff388ea0
[  231.897107] xhci_hcd :38:00.4: xhci_drop_endpoint called for udev 
43fc1c1f
[  231.897126] xhci_hcd :38:00.4: drop ep 0x1, slot id 1, new drop flags = 
0x4, new add flags = 0x0
[  231.897129] xhci_hcd :38:00.4: xhci_check_bandwidth called for udev 
43fc1c1f
[  231.897137] xhci_hcd :38:00.4: // Ding dong!
[  231.898523] xhci_hcd :38:00.4: Successful Endpoint Configure command

I have uploaded the whole dmesg file and the tracing file to transfer.sh: 
https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace

The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a "xhci: 
Use soft retry to recover faster from transaction errors". I think this commit 
should be reverted at least until a workaround has been found, especially since 
the next two kernel versions will be used by a lot of distributions(5.4 because 
it's a LTS kernel and 5.5 will probably be used in Ubuntu 20.04) so more users

Re: Regression: USB/xhci issues on some systems with newer kernel versions

2019-10-03 Thread Bernhard Gebetsberger
I sent the instructions to one of the users in the bug tracker.
Here is the download link for his logs: https://www.sendspace.com/file/413hlj

- Bernhard

Am 03.10.19 um 12:23 schrieb Mathias Nyman:
> On 2.10.2019 15.28, Bernhard Gebetsberger wrote:
>> Hi,
>>
>> There has been a regression in the xhci driver since kernel version 4.20, on 
>> some systems some usb devices won't work until the system gets rebooted.
>> The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to 
>> incorrect slot or ep state", although for some reason there are some usb 
>> devices that are affected by this issue but don't throw the error 
>> message(including the device I'm using, I got the error in previous kernel 
>> versions though).
>> It seems like this bug can also lead to system instability, one user 
>> reported in the bug 
>> tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got 
>> a system freeze because of this when using kernel 5.3.1.
>>
>
> Ok, lets take a look at this.
> Some of the symptoms vary a bit in the report, so lets focus on ones that
> show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
>
>> When looking at the responses in the bug tracker, it looks like it mostly 
>> affects Ryzen based systems with 300 series motherboards, although there are 
>> some other affected systems as well. It doesn't only affect wifi/bluetooth 
>> sticks, some users even got this issue when connecting their smartphone or 
>> their external hard drive to their PC.
>
>>
>> I have uploaded the whole dmesg file and the tracing file to transfer.sh: 
>> https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace
>
> Hmm, trying to download these just shows "Not Found"
>
> Could someone with a affected system enable tracing and dynamic debug on a
> recent kernel, take logs and traces of one failing instance where the message
> "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen.
>
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>
> < Trigger the issue >
>
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace
>
>>
>> The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a 
>> "xhci: Use soft retry to recover faster from transaction errors". I think 
>> this commit should be reverted at least until a workaround has been found, 
>> especially since the next two kernel versions will be used by a lot of 
>> distributions(5.4 because it's a LTS kernel and 5.5 will probably be used in 
>> Ubuntu 20.04) so more users would be affected by this.
>>
>
> There some time left before 5.4 is out, lets see if we can find the root 
> cause first.
>
> -Mathias
>


Re: Regression: USB/xhci issues on some systems with newer kernel versions

2019-10-10 Thread Bernhard Gebetsberger
I've just noticed that this problem also occurs when unplugging an affected 
device.
When unplugging the device the error
    "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
gets shown, even though I don't get this error when plugging the device in.

Here is a link to the dmesg and trace logs:
https://gist.github.com/Brn9hrd7/011405276fdf7a699dcc5cb83c67d276
maybe there is something useful in there that was missing in the previous logs.

- Bernhard


Am 03.10.19 um 17:13 schrieb Bernhard Gebetsberger:
> I sent the instructions to one of the users in the bug tracker.
> Here is the download link for his logs: https://www.sendspace.com/file/413hlj
>
> - Bernhard
>
> Am 03.10.19 um 12:23 schrieb Mathias Nyman:
>> On 2.10.2019 15.28, Bernhard Gebetsberger wrote:
>>> Hi,
>>>
>>> There has been a regression in the xhci driver since kernel version 4.20, 
>>> on some systems some usb devices won't work until the system gets rebooted.
>>> The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to 
>>> incorrect slot or ep state", although for some reason there are some usb 
>>> devices that are affected by this issue but don't throw the error 
>>> message(including the device I'm using, I got the error in previous kernel 
>>> versions though).
>>> It seems like this bug can also lead to system instability, one user 
>>> reported in the bug 
>>> tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got 
>>> a system freeze because of this when using kernel 5.3.1.
>>>
>> Ok, lets take a look at this.
>> Some of the symptoms vary a bit in the report, so lets focus on ones that
>> show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
>>
>>> When looking at the responses in the bug tracker, it looks like it mostly 
>>> affects Ryzen based systems with 300 series motherboards, although there 
>>> are some other affected systems as well. It doesn't only affect 
>>> wifi/bluetooth sticks, some users even got this issue when connecting their 
>>> smartphone or their external hard drive to their PC.
>>> I have uploaded the whole dmesg file and the tracing file to transfer.sh: 
>>> https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace
>> Hmm, trying to download these just shows "Not Found"
>>
>> Could someone with a affected system enable tracing and dynamic debug on a
>> recent kernel, take logs and traces of one failing instance where the message
>> "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen.
>>
>> mount -t debugfs none /sys/kernel/debug
>> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
>> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
>> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>>
>> < Trigger the issue >
>>
>> Send output of dmesg
>> Send content of /sys/kernel/debug/tracing/trace
>>
>>> The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a 
>>> "xhci: Use soft retry to recover faster from transaction errors". I think 
>>> this commit should be reverted at least until a workaround has been found, 
>>> especially since the next two kernel versions will be used by a lot of 
>>> distributions(5.4 because it's a LTS kernel and 5.5 will probably be used 
>>> in Ubuntu 20.04) so more users would be affected by this.
>>>
>> There some time left before 5.4 is out, lets see if we can find the root 
>> cause first.
>>
>> -Mathias
>>