Usb wifi stick doesn´t work until I reboot the system with kernel 4.20
Hi, After upgrading from the 4.19 kernel to 4.20 my usb wifi stick doesn´t work until I reboot the system(the problem occurs when the system was shut down). Dmesg shows the following error when the issue occurs: xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. I´ve already tried disabling iommu and I also tried reverting all rt2800usb commits that were made between 4.19 and 4.20, but this didn´t solve the issue, so it´s probably usb/xhci related. I´ve attached two dmesg logs(one before rebooting, one after rebooting) here: https://bugzilla.kernel.org/show_bug.cgi?id=202541 Any idea how to fix this? Thanks in advance, Bernhard
Regression xhci_hcd cmd failed due to incorrect slot or ep state
When using the kernel version 4.20 or above my usb wifi stick doesn't work until I reboot my system, dmesg shows this message when the issue occurs: xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state I've found that the problem is caused by this commit: commit f8f80be501aa2f10669585c3e328fad079d8cb3a Author: Mathias Nyman mailto:mathias.ny...@linux.intel.com>> Date: Thu Sep 20 19:13:37 2018 +0300 xhci: Use soft retry to recover faster from transaction errors It looks like it only affects wifi sticks(various models) on specific hardware configurations(I have this issue on my Ryzen pc, my Intel pc isn´t affected, but someone in the kernel bug tracker mentioned that he has this issue on an Intel notebook as well (only with USB 3.0 ports though)).
Regression xhci_hcd: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state
Hi Mathias, around 1.5 weeks ago I've sent the dmesg log and xhci_hcd tracing file to you. Is there anything else that needs to be provided? How should we precede otherwise? The problem occurs since kernel version 4.20 and it looks like more and more people are affected by this, most of them blame their wifi driver for it. Maybe it would be the best to just revert the patch that is causing the problem? The regression is caused by the changes in process_bulk_intr_td(), it's part of this commit: commit f8f80be501aa2f10669585c3e328fad079d8cb3a Author: Mathias Nyman mailto:mathias.ny...@linux.intel.com>> Date: Thu Sep 20 19:13:37 2018 +0300 xhci: Use soft retry to recover faster from transaction errors In case you missed the mail with the log files, I've uploaded them on transfer.sh: https://transfer.sh/KDEeE/dmesg and https://transfer.sh/14Imam/trace - Bernhard
Regression: USB/xhci issues on some systems with newer kernel versions
Hi, There has been a regression in the xhci driver since kernel version 4.20, on some systems some usb devices won't work until the system gets rebooted. The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state", although for some reason there are some usb devices that are affected by this issue but don't throw the error message(including the device I'm using, I got the error in previous kernel versions though). It seems like this bug can also lead to system instability, one user reported in the bug tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got a system freeze because of this when using kernel 5.3.1. When looking at the responses in the bug tracker, it looks like it mostly affects Ryzen based systems with 300 series motherboards, although there are some other affected systems as well. It doesn't only affect wifi/bluetooth sticks, some users even got this issue when connecting their smartphone or their external hard drive to their PC. After enabling kernel debugging/tracing for xhci_hcd I got the following messages in dmesg(short version, link to the whole file below): [ 231.185635] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 0, reg 0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x372 [ 231.185642] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 1, reg 0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x373 [ 231.185646] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 2, reg 0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x374 .. [ 231.887681] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 4, reg 0x3119 start_frame_id 0x624, end_frame_id 0x1a2, start_frame 0x633 [ 231.887687] xhci_hcd :38:00.4: xhci_get_isoc_frame_id: index 5, reg 0x3119 start_frame_id 0x624, end_frame_id 0x1a2, start_frame 0x634 [ 231.892346] xhci_hcd :38:00.4: Cancel URB 8599ca58, dev 1, ep 0x1, starting at offset 0xff388ea0 [ 231.892355] xhci_hcd :38:00.4: // Ding dong! [ 231.892363] xhci_hcd :38:00.4: Cancel URB 0d35fd5d, dev 1, ep 0x1, starting at offset 0xff388ef0 [ 231.892368] xhci_hcd :38:00.4: Cancel URB 74e3ee88, dev 1, ep 0x1, starting at offset 0xff388e40 [ 231.892640] xhci_hcd :38:00.4: Stopped on Transfer TRB for slot 1 ep 1 [ 231.892647] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388ea0 (dma). [ 231.892651] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388eb0 (dma). [ 231.892653] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388ec0 (dma). [ 231.892656] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388ed0 (dma). [ 231.892658] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388ee0 (dma). [ 231.892661] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388ef0 (dma). [ 231.892663] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388f00 (dma). [ 231.892666] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388f10 (dma). [ 231.892668] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388f20 (dma). [ 231.892670] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388f30 (dma). [ 231.892672] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388f40 (dma). [ 231.892675] xhci_hcd :38:00.4: Removing canceled TD starting at 0xff388e90 (dma). [ 231.892677] xhci_hcd :38:00.4: Finding endpoint context [ 231.892679] xhci_hcd :38:00.4: Cycle state = 0x1 [ 231.892682] xhci_hcd :38:00.4: New dequeue segment = 5d174923 (virtual) [ 231.892685] xhci_hcd :38:00.4: New dequeue pointer = 0xff388ea0 (DMA) [ 231.892688] xhci_hcd :38:00.4: Set TR Deq Ptr cmd, new deq seg = 5d174923 (0xff388000 dma), new deq ptr = d5c5ed2a (0xff388ea0 dma), new cycle = 1 [ 231.892693] xhci_hcd :38:00.4: // Ding dong! [ 231.892728] xhci_hcd :38:00.4: Successful Set TR Deq Ptr cmd, deq = @ff388ea0 [ 231.897107] xhci_hcd :38:00.4: xhci_drop_endpoint called for udev 43fc1c1f [ 231.897126] xhci_hcd :38:00.4: drop ep 0x1, slot id 1, new drop flags = 0x4, new add flags = 0x0 [ 231.897129] xhci_hcd :38:00.4: xhci_check_bandwidth called for udev 43fc1c1f [ 231.897137] xhci_hcd :38:00.4: // Ding dong! [ 231.898523] xhci_hcd :38:00.4: Successful Endpoint Configure command I have uploaded the whole dmesg file and the tracing file to transfer.sh: https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a "xhci: Use soft retry to recover faster from transaction errors". I think this commit should be reverted at least until a workaround has been found, especially since the next two kernel versions will be used by a lot of distributions(5.4 because it's a LTS kernel and 5.5 will probably be used in Ubuntu 20.04) so more users
Re: Regression: USB/xhci issues on some systems with newer kernel versions
I sent the instructions to one of the users in the bug tracker. Here is the download link for his logs: https://www.sendspace.com/file/413hlj - Bernhard Am 03.10.19 um 12:23 schrieb Mathias Nyman: > On 2.10.2019 15.28, Bernhard Gebetsberger wrote: >> Hi, >> >> There has been a regression in the xhci driver since kernel version 4.20, on >> some systems some usb devices won't work until the system gets rebooted. >> The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to >> incorrect slot or ep state", although for some reason there are some usb >> devices that are affected by this issue but don't throw the error >> message(including the device I'm using, I got the error in previous kernel >> versions though). >> It seems like this bug can also lead to system instability, one user >> reported in the bug >> tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got >> a system freeze because of this when using kernel 5.3.1. >> > > Ok, lets take a look at this. > Some of the symptoms vary a bit in the report, so lets focus on ones that > show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" > >> When looking at the responses in the bug tracker, it looks like it mostly >> affects Ryzen based systems with 300 series motherboards, although there are >> some other affected systems as well. It doesn't only affect wifi/bluetooth >> sticks, some users even got this issue when connecting their smartphone or >> their external hard drive to their PC. > >> >> I have uploaded the whole dmesg file and the tracing file to transfer.sh: >> https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace > > Hmm, trying to download these just shows "Not Found" > > Could someone with a affected system enable tracing and dynamic debug on a > recent kernel, take logs and traces of one failing instance where the message > "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen. > > mount -t debugfs none /sys/kernel/debug > echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control > echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control > echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb > echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable > > < Trigger the issue > > > Send output of dmesg > Send content of /sys/kernel/debug/tracing/trace > >> >> The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a >> "xhci: Use soft retry to recover faster from transaction errors". I think >> this commit should be reverted at least until a workaround has been found, >> especially since the next two kernel versions will be used by a lot of >> distributions(5.4 because it's a LTS kernel and 5.5 will probably be used in >> Ubuntu 20.04) so more users would be affected by this. >> > > There some time left before 5.4 is out, lets see if we can find the root > cause first. > > -Mathias >
Re: Regression: USB/xhci issues on some systems with newer kernel versions
I've just noticed that this problem also occurs when unplugging an affected device. When unplugging the device the error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" gets shown, even though I don't get this error when plugging the device in. Here is a link to the dmesg and trace logs: https://gist.github.com/Brn9hrd7/011405276fdf7a699dcc5cb83c67d276 maybe there is something useful in there that was missing in the previous logs. - Bernhard Am 03.10.19 um 17:13 schrieb Bernhard Gebetsberger: > I sent the instructions to one of the users in the bug tracker. > Here is the download link for his logs: https://www.sendspace.com/file/413hlj > > - Bernhard > > Am 03.10.19 um 12:23 schrieb Mathias Nyman: >> On 2.10.2019 15.28, Bernhard Gebetsberger wrote: >>> Hi, >>> >>> There has been a regression in the xhci driver since kernel version 4.20, >>> on some systems some usb devices won't work until the system gets rebooted. >>> The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to >>> incorrect slot or ep state", although for some reason there are some usb >>> devices that are affected by this issue but don't throw the error >>> message(including the device I'm using, I got the error in previous kernel >>> versions though). >>> It seems like this bug can also lead to system instability, one user >>> reported in the bug >>> tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got >>> a system freeze because of this when using kernel 5.3.1. >>> >> Ok, lets take a look at this. >> Some of the symptoms vary a bit in the report, so lets focus on ones that >> show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" >> >>> When looking at the responses in the bug tracker, it looks like it mostly >>> affects Ryzen based systems with 300 series motherboards, although there >>> are some other affected systems as well. It doesn't only affect >>> wifi/bluetooth sticks, some users even got this issue when connecting their >>> smartphone or their external hard drive to their PC. >>> I have uploaded the whole dmesg file and the tracing file to transfer.sh: >>> https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace >> Hmm, trying to download these just shows "Not Found" >> >> Could someone with a affected system enable tracing and dynamic debug on a >> recent kernel, take logs and traces of one failing instance where the message >> "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen. >> >> mount -t debugfs none /sys/kernel/debug >> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control >> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control >> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb >> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable >> >> < Trigger the issue > >> >> Send output of dmesg >> Send content of /sys/kernel/debug/tracing/trace >> >>> The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a >>> "xhci: Use soft retry to recover faster from transaction errors". I think >>> this commit should be reverted at least until a workaround has been found, >>> especially since the next two kernel versions will be used by a lot of >>> distributions(5.4 because it's a LTS kernel and 5.5 will probably be used >>> in Ubuntu 20.04) so more users would be affected by this. >>> >> There some time left before 5.4 is out, lets see if we can find the root >> cause first. >> >> -Mathias >>