On 19 December 2016 at 20:13, Mathias Nyman <mathias.ny...@intel.com> wrote: > On 19.12.2016 13:34, Baolin Wang wrote: >> >> Hi Mathias, >> >> On 19 December 2016 at 18:33, Mathias Nyman >> <mathias.ny...@linux.intel.com> wrote: >>> >>> On 13.12.2016 05:21, Baolin Wang wrote: >>>> >>>> >>>> Hi Mathias, >>>> >>>> On 12 December 2016 at 23:52, Mathias Nyman >>>> <mathias.ny...@linux.intel.com> wrote: >>>>> >>>>> >>>>> On 05.12.2016 09:51, Baolin Wang wrote: >>>>>> >>>>>> >>>>>> >>>>>> If a command event is found on the event ring during an interrupt, >>>>>> we need to stop the command timer with del_timer(). Since del_timer() >>>>>> can fail if the timer is running and waiting on the xHCI lock, then >>>>>> it maybe get the wrong timeout command in >>>>>> xhci_handle_command_timeout() >>>>>> if host fetched a new command and updated the xhci->current_cmd in >>>>>> handle_cmd_completion(). For this situation, we need a way to signal >>>>>> to the command timer that everything is fine and it should exit. >>>>> >>>>> >>>>> >>>>> >>>>> Ah, right, this could actually happen. >>>>> >>>>>> >>>>>> >>>>>> We should introduce a counter (xhci->current_cmd_pending) for the >>>>>> number >>>>>> of pending commands. If we need to cancel the command timer and >>>>>> del_timer() >>>>>> succeeds, we decrement the number of pending commands. If del_timer() >>>>>> fails, >>>>>> we leave the number of pending commands alone. >>>>>> >>>>>> For handling timeout command, in xhci_handle_command_timeout() we will >>>>>> check >>>>>> the counter after decrementing it, if the counter >>>>>> (xhci->current_cmd_pending) >>>>>> is 0, which means xhci->current_cmd is the right timeout command. If >>>>>> the >>>>>> counter (xhci->current_cmd_pending) is greater than 0, which means >>>>>> current >>>>>> timeout command has been handled by host and host has fetched new >>>>>> command >>>>>> as >>>>>> xhci->current_cmd, then just return and wait for new current command. >>>>> >>>>> >>>>> >>>>> >>>>> A counter like this could work. >>>>> >>>>> Writing the abort bit can generate either ABORT+STOP, or just STOP >>>>> event, this seems to cover both. >>>>> >>>>> quick check, case 1: timeout and cmd completion at the same time. >>>>> >>>>> cpu1 cpu2 >>>>> >>>>> queue_command(first), p++ (=1) >>>>> queue_command(more), >>>>> --completion irq fires-- -- timer times out at same >>>>> time-- >>>>> handle_cmd_completion() handle_cmd_timeout(),) >>>>> lock(xhci_lock ) spin_on(xhci_lock) >>>>> del_timer() fail, p (=1, nochange) >>>>> cur_cmd = list_next(), p++ (=2) >>>>> unlock(xhci_lock) >>>>> lock(xhci_lock) >>>>> p-- (=1) >>>>> if (p > 0), exit >>>>> OK works >>>>> >>>>> case 2: normal timeout case with ABORT+STOP, no race. >>>>> >>>>> cpu1 cpu2 >>>>> >>>>> queue_command(first), p++ (=1) >>>>> queue_command(more), >>>>> handle_cmd_timeout() >>>>> p-- (P=0), don't exit >>>>> mod_timer(), p++ (P=1) >>>>> write_abort_bit() >>>>> handle_cmd_comletion(ABORT) >>>>> del_timer(), ok, p-- (p = 0) >>>>> handle_cmd_completion(STOP) >>>>> del_timer(), fail, (P=0) >>>>> handle_stopped_cmd_ring() >>>>> cur_cmd = list_next(), p++ (=1) >>>>> mod_timer() >>>>> >>>>> OK, works, and same for just STOP case, with the only difference that >>>>> during handle_cmd_completion(STOP) p is decremented (p--) >>>> >>>> >>>> >>>> Yes, that's the cases what I want to handle, thanks for your explicit >>>> explanation. >>>> >>> >>> Gave this some more thought over the weekend, and this implementation >>> doesn't solve the case when the last command times out and races with the >>> completion handler: >>> >>> cpu1 cpu2 >>> >>> queue_command(first), p++ (=1) >>> --completion irq fires-- -- timer times out at same time-- >>> handle_cmd_completion() handle_cmd_timeout(),) >>> lock(xhci_lock ) spin_on(xhci_lock) >>> del_timer() fail, p (=1, nochange) >>> no more commands, P (=1, nochange) >>> unlock(xhci_lock) >>> lock(xhci_lock) >>> p-- (=0) >>> p == 0, continue, even if we >>> should >>> not. >>> For this we still need to rely >>> on >>> checking cur_cmd == NULL in the timeout function. >>> (Baolus patch sets it to NULL if there are no more commands pending) >> >> >> As I pointed out in patch 1 of this patchset, this patchset is based >> on Lu Baolu's new fix patch: >> usb: xhci: fix possible wild pointer >> https://www.spinics.net/lists/linux-usb/msg150219.html >> >> After applying Baolu's patch, after decrement the counter, we will >> check the xhci->cur_command if is NULL. So in this situation: >> cpu1 cpu2 >> >> queue_command(first), p++ (=1) >> --completion irq fires-- -- timer times out at same >> time-- >> handle_cmd_completion() handle_cmd_timeout(),) >> lock(xhci_lock ) spin_on(xhci_lock) >> del_timer() fail, p (=1, nochange) >> no more commands, P (=1, nochange) >> unlock(xhci_lock) >> lock(xhci_lock) >> p-- (=0) >> no current command, return >> if (!xhci->current_cmd) { >> unlock(xhci_lock); >> return; >> } >> >> It can work. > > > Yes, > > What I wanted to say is that as it relies on Baolus patch for that one case > it seems that patch 2/2 can be replaced by a single line change: > > if (!xhci->current_cmd || timer_pending(&xhci->cmd_timer)) > > Right?
After checking the code again, I am agree with you. I will resend the patch with fixing this issue. Thanks. -- Baolin.wang Best Regards