Hi, Vincent: We tried my patch doesn't fix the issue totally, we still see the kernel panic. currently solution in our test is: Rvert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers" and only add the below patch to use the in_interrupt() to avoid the wait_event_lock_irq() is called in interrupt context.
Subject: [PATCH] fix the wait_event_lock_irq run in interrupt context the patch is to fix the below bug: BUG: scheduling while atomic: SettingsProvide/3433/0x00000104 Preemption disabled at: [<ffffffff81e00053>] __do_softirq+0x53/0x31b CPU: 1 PID: 3433 Comm: SettingsProvide Tainted: P U ilt-2e5dc0ac-g3f662bf-dirty #8 Call Trace: <IRQ> dump_stack+0x70/0x9e ? __do_softirq+0x53/0x31b __schedule_bug+0x7f/0xd0 __schedule+0x61d/0x860 ? _raw_spin_unlock_irqrestore+0x28/0x50 ? prepare_to_wait_event+0x87/0x170 schedule+0x36/0x90 dwc3_gadget_ep_dequeue+0x27a/0x2e0 [dwc3] ? finish_wait+0x90/0x90 usb_ep_dequeue+0x23/0x90 ffs_aio_cancel+0x4c/0x70 kiocb_cancel+0x40/0x50 free_ioctx_users+0x6e/0x100 percpu_ref_switch_to_atomic_rcu+0x159/0x160 rcu_process_callbacks+0x1a7/0x520 __do_softirq+0x11a/0x31b irq_exit+0xb5/0xc0 smp_apic_timer_interrupt+0x7c/0x160 the BUG is introduced form the patch: commit: cf3113d89 usb: dwc3: gadget: properly increment dequeue pointer on ep_dequeue Signed-off-by: he, bo <bo...@intel.com> --- drivers/usb/dwc3/gadget.c | 112 +++++++++++++++++++++++----------------------- 1 file changed, 57 insertions(+), 55 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 7120678..386fd82 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1413,65 +1413,67 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, /* wait until it is processed */ dwc3_stop_active_transfer(dwc, dep->number, true); - /* - * If request was already started, this means we had to - * stop the transfer. With that we also need to ignore - * all TRBs used by the request, however TRBs can only - * be modified after completion of END_TRANSFER - * command. So what we do here is that we wait for - * END_TRANSFER completion and only after that, we jump - * over TRBs by clearing HWO and incrementing dequeue - * pointer. - * - * Note that we have 2 possible types of transfers here: - * - * i) Linear buffer request - * ii) SG-list based request - * - * SG-list based requests will have r->num_pending_sgs - * set to a valid number (> 0). Linear requests, - * normally use a single TRB. - * - * For each of these two cases, if r->unaligned flag is - * set, one extra TRB has been used to align transfer - * size to wMaxPacketSize. - * - * All of these cases need to be taken into - * consideration so we don't mess up our TRB ring - * pointers. - */ - wait_event_lock_irq(dep->wait_end_transfer, - !(dep->flags & DWC3_EP_END_TRANSFER_PENDING), - dwc->lock); - - if (!r->trb) - goto out0; - - if (r->num_pending_sgs) { - struct dwc3_trb *trb; - int i = 0; - - for (i = 0; i < r->num_pending_sgs; i++) { - trb = r->trb + i; - trb->ctrl &= ~DWC3_TRB_CTRL_HWO; - dwc3_ep_inc_deq(dep); - } + if(likely(!in_interrupt())) { + /* + * If request was already started, this means we had to + * stop the transfer. With that we also need to ignore + * all TRBs used by the request, however TRBs can only + * be modified after completion of END_TRANSFER + * command. So what we do here is that we wait for + * END_TRANSFER completion and only after that, we jump + * over TRBs by clearing HWO and incrementing dequeue + * pointer. + * + * Note that we have 2 possible types of transfers here: + * + * i) Linear buffer request + * ii) SG-list based request + * + * SG-list based requests will have r->num_pending_sgs + * set to a valid number (> 0). Linear requests, + * normally use a single TRB. + * + * For each of these two cases, if r->unaligned flag is + * set, one extra TRB has been used to align transfer + * size to wMaxPacketSize. + * + * All of these cases need to be taken into + * consideration so we don't mess up our TRB ring + * pointers. + */ + wait_event_lock_irq(dep->wait_end_transfer, + !(dep->flags & DWC3_EP_END_TRANSFER_PENDING), + dwc->lock); + + if (!r->trb) + goto out0; + + if (r->num_pending_sgs) { + struct dwc3_trb *trb; + int i = 0; + + for (i = 0; i < r->num_pending_sgs; i++) { + trb = r->trb + i; + trb->ctrl &= ~DWC3_TRB_CTRL_HWO; + dwc3_ep_inc_deq(dep); + } + + if (r->unaligned || r->zero) { + trb = r->trb + r->num_pending_sgs + 1; + trb->ctrl &= ~DWC3_TRB_CTRL_HWO; + dwc3_ep_inc_deq(dep); + } + } else { + struct dwc3_trb *trb = r->trb; - if (r->unaligned || r->zero) { - trb = r->trb + r->num_pending_sgs + 1; trb->ctrl &= ~DWC3_TRB_CTRL_HWO; dwc3_ep_inc_deq(dep); - } - } else { - struct dwc3_trb *trb = r->trb; - trb->ctrl &= ~DWC3_TRB_CTRL_HWO; - dwc3_ep_inc_deq(dep); - - if (r->unaligned || r->zero) { - trb = r->trb + 1; - trb->ctrl &= ~DWC3_TRB_CTRL_HWO; - dwc3_ep_inc_deq(dep); + if (r->unaligned || r->zero) { + trb = r->trb + 1; + trb->ctrl &= ~DWC3_TRB_CTRL_HWO; + dwc3_ep_inc_deq(dep); + } } } goto out1; -- 2.7.4 -----Original Message----- From: Vincent Pelletier <plr.vinc...@gmail.com> Sent: Tuesday, September 25, 2018 8:46 PM To: He, Bo <bo...@intel.com> Cc: Felipe Balbi <felipe.ba...@linux.intel.com>; Alan Stern <st...@rowland.harvard.edu>; Roger Quadros <rog...@ti.com>; Lars-Peter Clausen <l...@metafoo.de>; Sam Protsenko <semen.protse...@linaro.org>; linux-usb@vger.kernel.org; Praneeth Bajjuri <prane...@ti.com>; Bai, Jie A <jie.a....@intel.com> Subject: Re: usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers Hello, On Thu, 2 Aug 2018 14:23:51 +0000, Vincent Pelletier <plr.vinc...@gmail.com> wrote: > On Thu, 2 Aug 2018 00:45:14 +0000, "He, Bo" <bo...@intel.com> wrote: > > Your patch fix the issue BUG: scheduling while atomic: > > Yes, although from my understanding of Felipe's answer, the actual bug > is the "scheduling" part (sleeping in dwc3 UDC) rather than the > "atomic" part. > > So my patch addresses, still if my understanding is correct, the wrong > half of the problem, and even introduced the regression you identified. > Hence my uncertainty... I notice that neither He's patch, nor a dwc3 change to prevent it from scheduling inside usb_ep_dequeue are in Linus' master. Please correct me if I missed something. Just in case my previous emails were not clear: - I have no objection to He's patch on its own (and I do not know the code nearly enough to provide a meaningful reviewed-by). - I do not intend to work on making changes to dwc3 gadget to stop it from scheduling in usb_ep_dequeue in the foreseeable future. So if there is no ongoing work on dwc3 cancel behaviour (Felipe ?), please do resume/carry on with reviewing and integrating He's patch. It is only *if* dwc3 cancel stops scheduling that I believe my patch should be reverted (here is the hash as of Linus' master): commit d52e4d0c0c428bf2ba35074a7495cdb28e2efbae Author: Vincent Pelletier <plr.vinc...@gmail.com> Date: Wed Jun 13 11:05:06 2018 +0000 usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers This bug happens only when the UDC needs to sleep during usb_ep_dequeue, as is the case for (at least) dwc3. and, in my understanding, a consequence is that He's fix would not be needed anymore - the bug my patch introduced disappearing in the revert. Regards, -- Vincent Pelletier