On 30/08/18(Thu) 18:15, Tom Murphy wrote: > On Thu, Aug 30, 2018 at 10:30:04AM -0300, Martin Pieuchot wrote: > > On 30/08/18(Thu) 14:00, Tom Murphy wrote: > > > On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote: > > > > On 28/08/18(Tue) 22:22, Tom Murphy wrote: > > > > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote: > > > > > > Hello Tom, > > > > > > > > > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote: > > > > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote: > > > > > > > > On 2018-08-25 21:40:57, Tom Murphy <[email protected]> wrote: > > > > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote: > > > > > > > > > > I've narrowed it down. > > > > > > > > > > > > > > > > > > > >Last kernel where adb works: June 24 09:59:46 MDT 2018 > > > > > > > > > >1st Kernel where adb panics: June 25 13:10:32 MDT 2018 > > > > > > > > > > > > The real problem is in the xhci(4) driver. When a command with a > > > > > > timeout is submitted we should ensure no other command is enqueued > > > > > > before continuing. Sadly the driver did not include any mechanism > > > > > > to serialize command submissions. Diff below does that and should > > > > > > fix your problem. > > > > > > > > > > > > Can you try it on top of -current? Make sure you have no diff > > > > > > reverted. > > > > > > > > > > Hi, > > > > > > > > > > I think I spoke a little too soon. I found a case where it > > > > > started printing xhci0: command timeout over and over until > > > > > eventually the kernel panics with a protection fault. I couldn't > > > > > catch the backtrace properly, but it looked around the same area > > > > > as this original bug report. > > > > > > > > Without backtrace I can't make progress. > > > > > > Apologies for the delay. Just found time to reproduce this. Here's > > > a backtrace: > > > > Almost, can you send the full dmesg with the backtrace at the end? > > Hi, Sorry, here's the dmesg with the backtrace.
Is it the live dmesg? I don't see any 'xhci0: command timeout'. Btw this message doesn't exist so I can't understand which code path is triggering the problem. Could you build a kernel with XHCI_DEBUG enabled, reproduce the page fault and send the dmesg (at least the last 10 lines before crashing) + the trace?
