Hi, On 12/21/2016 08:48 PM, Mathias Nyman wrote: > On 21.12.2016 08:17, Lu Baolu wrote: >> Hi Mathias, >> >> I have some comments for the implementation of xhci_abort_cmd_ring() below. >> >> On 12/20/2016 11:13 PM, Mathias Nyman wrote: >>> On 20.12.2016 09:30, Baolin Wang wrote: >>> ... >>> >>> Alright, I gathered all current work related to xhci races and timeouts >>> and put them into a branch: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git >>> timeout_race_fixes >>> >>> Its based on 4.9 >>> It includes a few other patches just to avoid conflicts and make my life >>> easier >>> >>> Interesting patches are: >>> >>> ee4eb91 xhci: remove unnecessary check for pending timer >>> 0cba67d xhci: detect stop endpoint race using pending timer instead of >>> counter. >>> 4f2535f xhci: Handle command completion and timeout race >>> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort >>> command >>> 529a5a0 usb: xhci: fix possible wild pointer >>> 4766555 xhci: Fix race related to abort operation >>> de834a3 xhci: Use delayed_work instead of timer for command timeout >>> 69973b8 Linux 4.9 >>> >>> The fixes for command queue races will go to usb-linus and stable, the >>> reworks for stop ep watchdog timer will go to usb-next. >>> >>> Still completely untested, (well it compiles) >>> >>> Felipe gave instructions how to modify dwc3 driver to timeout on address >>> devicecommands to test these, I'll try to set that up. >>> >>> All additional testing is welcome, especially if you can trigger timeouts >>> and races >>> >>> -Mathias >>> >>> >> >> Below is the latest code. I put my comments in line. >> >> 322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) >> 323 { >> 324 u64 temp_64; >> 325 int ret; >> 326 >> 327 xhci_dbg(xhci, "Abort command ring\n"); >> 328 >> 329 reinit_completion(&xhci->cmd_ring_stop_completion); >> 330 >> 331 temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); >> 332 xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, >> 333 &xhci->op_regs->cmd_ring); >> >> We should hold xhci->lock when we are modifying xhci registers >> at runtime. >> > > Makes sense, but we need to unlock it before sleeping or waiting for > completion. > I need to look into that in more detail. > > But this was an issue already before these changes. > >> The retry of setting CMD_RING_ABORT is not necessary according to >> previous discussion. We have cleaned code for second try in >> xhci_handle_command_timeout(). Need to clean up here as well. >> > > Yes it can be cleaned up as well, but the two cases are a bit different. > The cleaned up one was about command ring not starting again after it was > stopped. > > This second try is a workaround for what we thought was the command ring > failing > to stop in the first place, but is most likely due to the race that OGAWA > Hirofumi > fixed. It races if the stop command ring interrupt happens between writing > the abort > bit and polling for the ring stopped bit. The interrupt hander may start the > command > ring again, and we would believe we failed to stop it in the first place. > > This race could probably be fixed by just extending the lock (and preventing > interrupts) to cover both writing the abort bit and polling for the command > ring > running bit, as you pointed out here previously. > > But then again I really like OGAWA Hiroumi's solution that separates the > command ring stopping from aborting commands and restarting the ring. > > The current way of always restarting the command ring as a response to > a stop command ring command really limits its usage. > > So, with this in mind most reasonable would be to > 1. fix the lock to cover abort+CRR check, and send it to usb-linus +stable > 2. rebase OGAWA Hirofumi's changes on top of that, and send to usb-linus only > 3. remove unnecessary second abort try as a separate patch, send to usb-next > 4. remove polling for the Command ring running (CRR), waiting for completion > is enough, if completion times out then we can check CRR. for usb-next > I'll fix the typos these patches would introduce. Fixing old typos can be > done as separate > patches later.
This is exactly the same as what I am thinking of. I will submit the patches later. Best regards, Lu Baolu -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html