Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.

2016-05-25 Thread Joe Lawrence
On 05/12/2016 07:57 AM, Mathias Nyman wrote: > If commands timeout we mark them for abortion, then stop the command > ring, and turn the commands to no-ops and finally restart the command > ring. > > If the host is working properly the no-op commands will finish and > pending completions are calle

Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.

2016-05-13 Thread Joe Lawrence
On 05/12/2016 07:57 AM, Mathias Nyman wrote: > If commands timeout we mark them for abortion, then stop the command > ring, and turn the commands to no-ops and finally restart the command > ring. > > If the host is working properly the no-op commands will finish and > pending completions are calle

Re: xhci_handle_command_timeout and wait_for_completion

2016-05-10 Thread Joe Lawrence
On 05/09/2016 06:18 AM, Mathias Nyman wrote: > On 06.05.2016 23:32, Joe Lawrence wrote: >> ...snip... >> >> Given that the default command timeout is 5 seconds, it seems strange to >> hit a 120 second hung task warning in this instance. I can only think >> that

xhci_handle_command_timeout and wait_for_completion

2016-05-06 Thread Joe Lawrence
Hello Mathias, In c311e391a7ef "xhci: rework command timeout and cancellation," xHCI command timeouts were refactored to flow through xhci_handle_command_timeout. We've seen a few instances of hangs/crashes with upstream and RHEL7.2 kernels where someone gets stuck on wait_for_completion(cmd->com

Re: [PATCH v2] xhci: Cleanup only when releasing primary hcd.

2016-04-29 Thread Joe Lawrence
On 04/28/2016 04:30 AM, Roger Quadros wrote: > Hi Joe, > > On 27/04/16 23:41, Joe Lawrence wrote: >> Hello Mathias, Roger, Gabriel >> >> [ ... snip ... ] >> In the meantime I was browsing recent linux-scsi archives and noticed >> Gabriel's [PATCH v

Re: [PATCH v2] xhci: Cleanup only when releasing primary hcd.

2016-04-27 Thread Joe Lawrence
Hello Mathias, Roger, Gabriel I've been chasing strange MSI / legacy IRQ behavior from xHCI for a couple days and wanted to report a few things that may be effected by Gabriel's recent "xhci: Cleanup only when releasing primary hcd" patch (more on this at the bottom). After 8c24d6d7b09d "usb: xhc

Re: use after free of bos pointer in usb_reset_and_verify_device?

2016-02-29 Thread Joe Lawrence
On 02/29/2016 11:41 AM, Greg KH wrote: > On Mon, Feb 29, 2016 at 11:06:55AM -0500, Joe Lawrence wrote: >> Hi Alan, Changbin, Xenia, >> >> I've twice encountered a crash on system reboot in usb_disable_device >> that looks to be a bos descriptor use-after-free.

use after free of bos pointer in usb_reset_and_verify_device?

2016-02-29 Thread Joe Lawrence
Hi Alan, Changbin, Xenia, I've twice encountered a crash on system reboot in usb_disable_device that looks to be a bos descriptor use-after-free. The machine in question is running a 4.5-rc5 kernel with proprietary and out-of-tree Stratus drivers that facilitate device-removal testing -- these dr

[PATCH for-usb-linus] xhci: harden xhci_find_next_ext_cap against device removal

2016-02-03 Thread Joe Lawrence
after commit d5ddcdf4d672 ("xhci: rework xhci extended capability list parsing functions"). Signed-off-by: Joe Lawrence --- Patch based on Mathias's for-usb-linus tree as it addresses a crashing bug present in the current release candidate. The crash is repeatable on a Stratus pla

Re: xhci list corruption on sysfs removal

2016-01-21 Thread Joe Lawrence
On 12/23/2015 08:40 AM, Joe Lawrence wrote: > On 12/21/2015 10:07 AM, Mathias Nyman wrote: >> Hi >> >> On 18.12.2015 18:48, Joe Lawrence wrote: >>> Hello Roger and Mathias, >>> >>> Running with slub_debug=FZPU and removing an XHCI host controller vi

Re: xhci list corruption on sysfs removal

2015-12-23 Thread Joe Lawrence
On 12/21/2015 10:07 AM, Mathias Nyman wrote: Hi On 18.12.2015 18:48, Joe Lawrence wrote: Hello Roger and Mathias, Running with slub_debug=FZPU and removing an XHCI host controller via sysfs, I've hit a use-after-free that I've bisected to: 8c24d6d7b09deee3036ddc4f2b81b53b28c8f

xhci list corruption on sysfs removal

2015-12-18 Thread Joe Lawrence
Hello Roger and Mathias, Running with slub_debug=FZPU and removing an XHCI host controller via sysfs, I've hit a use-after-free that I've bisected to: 8c24d6d7b09deee3036ddc4f2b81b53b28c8f877 is the first bad commit commit 8c24d6d7b09deee3036ddc4f2b81b53b28c8f877 Author: Roger Quadros Da

Re: [PATCH] xhci: Workaround to get Intel xHCI reset working more reliably

2015-10-19 Thread Joe Lawrence
ooking into this XHCI reset quirk. We ran this patch on a Stratus FT machine and it successfully reset ~1500 times over the weekend without any issue. Feel free to add: Tested-by: Joe Lawrence -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: xhci irq event bogus return value ffffff94

2015-04-21 Thread Joe Lawrence
On 04/21/2015 08:21 AM, Mathias Nyman wrote: [...] > On the other hand if we just removed xhci, and share the interrupt with > somebody else who is > also generating an interrupts, then we would probably continue to read > 0x from the status reg and > should return IRQ_NONE. Yes, I thin

Re: xhci irq event bogus return value ffffff94

2015-04-20 Thread Joe Lawrence
On Mon, Apr 20, 2015 at 01:35:40PM -0400, Alan Stern wrote: > On Mon, 20 Apr 2015, Joe Lawrence wrote: > > > So -ESHUTDOWN = -108 (0xff94) provoked bad_action_ret into reporting > > a bogus return value and stack trace above. > > As far as I know, -Eanything is never

xhci irq event bogus return value ffffff94

2015-04-20 Thread Joe Lawrence
options about the following patch? Regards, -- Joe [1] https://bugzilla.redhat.com/show_bug.cgi?id=692425 --->8-- -->8-- -->8-- >From ff69f1bb5601ce5f0e70bb2e97c65456e13dc38e Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Mon, 20 Apr 2015 11:14:47 -0400 Subject: [PATCH] xhci: gracefully h

Re: usb-storage URB use-after-free

2015-01-30 Thread Joe Lawrence
On Thu, 29 Jan 2015 11:42:18 -0500 Alan Stern wrote: > On Wed, 28 Jan 2015, Joe Lawrence wrote: > > > This one should have gone over to linux-usb. > > > > -- Joe > > > > On 01/28/2015 05:04 PM, Joe Lawrence wrote: > > > Hello linux-usb, > >

Re: usb-storage URB use-after-free

2015-01-28 Thread Joe Lawrence
This one should have gone over to linux-usb. -- Joe On 01/28/2015 05:04 PM, Joe Lawrence wrote: > Hello linux-usb, > > We've hit a USB use-after-free on Stratus HW during device removal tests. > We're running fio disk I/O to a scsi disk hanging off USB when the USB &

Re: Hitting "unused qh not empty" BUG in qh_destroy

2014-09-17 Thread Joe Lawrence
On Tue, 16 Sep 2014 15:29:20 -0400 Alan Stern wrote: > ... And now I see the problem. It's these two lines just before the > "switch": > > if (ehci->rh_state < EHCI_RH_RUNNING) > qh->qh_state = QH_STATE_IDLE; > > That undoubtedly caused us to destroy the QH directly withou

Re: Hitting "unused qh not empty" BUG in qh_destroy

2014-09-16 Thread Joe Lawrence
On Tue, 16 Sep 2014 11:56:14 -0400 Alan Stern wrote: > On Tue, 16 Sep 2014, Joe Lawrence wrote: > > > > Anyway, the log above means that a QH was linked before the HC died, > > > but then it was never unlinked. Please add a line at the start of > > > ehci_e

Re: Hitting "unused qh not empty" BUG in qh_destroy

2014-09-16 Thread Joe Lawrence
On Tue, 16 Sep 2014 10:44:45 -0400 Alan Stern wrote: > On Tue, 16 Sep 2014, Joe Lawrence wrote: > > > > You can check for this. Sprinkle ehci_info messages throughout > > > ehci_stop, printing the value of ehci->async->qh_next.qh. It should be > > >

Re: Hitting "unused qh not empty" BUG in qh_destroy

2014-09-16 Thread Joe Lawrence
On Mon, 15 Sep 2014 11:59:15 -0400 Alan Stern wrote: > On Sat, 13 Sep 2014, Joe Lawrence wrote: > > > Hi Alan, > > > > I've collected 16 crashes since kicking off automated tests a little > > over 24 hrs ago. > > > > Each crash hit the BU

Re: Hitting "unused qh not empty" BUG in qh_destroy

2014-09-13 Thread Joe Lawrence
On Fri, 12 Sep 2014, Alan Stern wrote: > On Fri, 12 Sep 2014, Joe Lawrence wrote: > > > On Fri, 12 Sep 2014 11:31:46 -0400 > > Alan Stern wrote: > > > > > On Thu, 11 Sep 2014, Joe Lawrence wrote: > > > > > > > Hi Alan, > > > >

Re: Hitting "unused qh not empty" BUG in qh_destroy

2014-09-12 Thread Joe Lawrence
On Fri, 12 Sep 2014 11:31:46 -0400 Alan Stern wrote: > On Thu, 11 Sep 2014, Joe Lawrence wrote: > > > Hi Alan, > > > > I've got another USB bug to report that manifests during automated > > device removal testing on RHEL7. This one hits the BUG() inside &g

Hitting "unused qh not empty" BUG in qh_destroy

2014-09-11 Thread Joe Lawrence
Hi Alan, I've got another USB bug to report that manifests during automated device removal testing on RHEL7. This one hits the BUG() inside qh_destroy: PID: 139TASK: 881054101960 CPU: 22 COMMAND: "kworker/u66:0" #0 [881054113540] machine_kexec at 810411b1 #1 [88105411

Re: crash in recursively_mark_NOTATTACHED

2014-09-10 Thread Joe Lawrence
On Tue, 9 Sep 2014 14:29:11 -0400 Alan Stern wrote: > On Tue, 9 Sep 2014, Joe Lawrence wrote: > > > On Tue, 9 Sep 2014 11:30:24 -0400 > > Alan Stern wrote: > > > > > On Tue, 9 Sep 2014, Joe Lawrence wrote: > > > > > > In summary, khu

[PATCH] usb: hub: take hub->hdev reference when processing from eventlist

2014-09-10 Thread Joe Lawrence
_device(hdev) ... usb_unlock_device(hdev) usb_put_dev(hdev) kref_put(&hub->kref, hub_release) No reports from slub_debug during lastnight's tests. -->8-- -->8-- >From 5f169da5fbdb6374dc23e8202a7a06fd27196a07 Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Tue, 9 Sep 2014 17:24:4

Re: crash in recursively_mark_NOTATTACHED

2014-09-09 Thread Joe Lawrence
On Tue, 9 Sep 2014 11:30:24 -0400 Alan Stern wrote: > On Tue, 9 Sep 2014, Joe Lawrence wrote: > > In summary, khubd has initialized the usb_device maxchild to 8 and > > provided backing-store for the usb_hub ports[] array. However, before > > it gets to fill in pointers

use after free in hub_events

2014-09-09 Thread Joe Lawrence
ce. In my traces, that's too late as its already been freed and poisoned. There's probably a better way to coordinate these two functions, but the following change (on top of the one in my other mail) has run our device removal tests without incident thus far. Thanks, -- Joe -->8-- --

crash in recursively_mark_NOTATTACHED

2014-09-09 Thread Joe Lawrence
completely solves the issue at hand, or simply covers up the crash. Comments welcome. Regards, -- Joe -->8-- -->8-- >From abb49e02e2f56ed1528198dfe242a9dd3041dc79 Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Fri, 5 Sep 2014 15:02:29 -0400 Subject: [PATCH] usb: hub: protect recursively_mark_NOTATTACHED and hal