On Sun, Apr 12, 2020 at 01:58:18PM +0000, Mikolaj Kucharski wrote:
> On Sun, Apr 12, 2020 at 03:22:01PM +0200, Patrick Wildt wrote:
> > Maybe, make that printf conditional on something weird, like the
> > following. Since, if it runs off the DMA buffers map, maybe actlen
> > is broken.
> >
> > if (xfer->length < xfer->actlen)
> > printf("ehci_idon: ....
>
> Bingo. Kernel panic on first boot.
>
> ehci_idone: len=4, actlen=-15996, cerr=3, status=0x8c00 type=0x0
>
At the end is my debugging patch, so it's easier (hopefully) to
understand dmesg output just before kernel panic. Please be aware of
wrong indentations, but I wanted to make my changes against existing
code minimal. Below diff is with latest Patrick's changes in this
thread.
So, with below diff crash looks as follows (transcribed by hand):
urtwn0: RUN -> INIT
urtwn0: INIT -> SCAN
urtwn0: begin active scan
urtwn0: SCAN -> SCAN
urtwn0: sending probe_req to ff:ff:ff:ff:ff:ff on channel 2 mode auto
urtwn0: SCAN -> INIT
XXX ehci_idone: len=1, actlen=4294951305, xf_actlen=0, status=0x8c00
XXX ehci_idone: x_actlen=4294951304, x1=16000, x2=8
XXX ehci_idone: x_actlen=4294951305, x1=0, x2=1
XXX ehci_idone: x_actlen=4294951305, x1=0, x2=0
ehci_idone: len=1, actlen=4294951305, x_actlen=4294951305, cerr=3,
status=0x8c00 type=0x0
panic: _dmamap_sync: ran off map!
and the usual panic trace with ehci_idone() starts from that moment.
With above output I'm following this code path:
908 if (xfer->length < actlen) {
909 printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, "
910 "status=0x%x\n", xfer->length, actlen,
xfer->actlen, status);
911 x_actlen = 0;
912 for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) {
913 usb_syncmem(&sqtd->dma, sqtd->offs, sizeof(sqtd->qtd),
914 BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
915 nstatus = letoh32(sqtd->qtd.qtd_status);
916 if (nstatus & EHCI_QTD_ACTIVE)
917 break;
918
919 status = nstatus;
920 /* halt is ok if descriptor is last, and complete */
921 if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) &&
922 EHCI_QTD_GET_BYTES(status) == 0)
923 status &= ~EHCI_QTD_HALTED;
924 if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) {
925 x1 = EHCI_QTD_GET_BYTES(status);
926 x2 = sqtd->len;
927 x_actlen += x2 - x1;
928 printf("XXX ehci_idone: x_actlen=%u, x1=%u,
x2=%u\n", x_actlen, x1, x2);
929 }
930 }
931 }
On the first iteration of above for loop x_actlen ends up as 4294951304,
in line 927 because EHCI_QTD_GET_BYTES(status) returns 16000, line 925
and sqtd->len returns 8 line 926. So EHCI_QTD_GET_BYTES(status) is
bigger than sqtd->len and I don't know is this expected, can this
happened. From that moment x_actlen will have large value and that
will lead to kernel panic.
I actually have seen this before also with urtwn(4), but don't
remember was that Pinebook or amd64, but dmesg says:
usb_transfer_complete: actlen > len 4294951172 > 4
That was couple of days, or weeks before recent changes to USB stack.
That message comes from usb_transfer_complete() file dev/usb/usbdi.c
around lines 738 to 744.
I'm writing this, so maybe someone can push me in the correct direction
how to help debug this further.
Index: sys/dev/usb/ehci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.210
diff -u -p -u -r1.210 ehci.c
--- sys/dev/usb/ehci.c 3 Apr 2020 20:11:47 -0000 1.210
+++ sys/dev/usb/ehci.c 13 Apr 2020 18:04:51 -0000
@@ -869,7 +869,9 @@ ehci_idone(struct usbd_xfer *xfer)
struct ehci_xfer *ex = (struct ehci_xfer *)xfer;
struct ehci_soft_qtd *sqtd;
u_int32_t status = 0, nstatus = 0;
- int actlen, cerr;
+ u_int32_t actlen, x_actlen;
+ int cerr;
+ u_int32_t x1, x2;
#ifdef DIAGNOSTIC
{
@@ -903,8 +905,33 @@ ehci_idone(struct usbd_xfer *xfer)
actlen += sqtd->len - EHCI_QTD_GET_BYTES(status);
}
+ if (xfer->length < actlen) {
+ printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, "
+ "status=0x%x\n", xfer->length, actlen, xfer->actlen,
status);
+ x_actlen = 0;
+ for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) {
+ usb_syncmem(&sqtd->dma, sqtd->offs, sizeof(sqtd->qtd),
+ BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
+ nstatus = letoh32(sqtd->qtd.qtd_status);
+ if (nstatus & EHCI_QTD_ACTIVE)
+ break;
+
+ status = nstatus;
+ /* halt is ok if descriptor is last, and complete */
+ if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) &&
+ EHCI_QTD_GET_BYTES(status) == 0)
+ status &= ~EHCI_QTD_HALTED;
+ if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) {
+ x1 = EHCI_QTD_GET_BYTES(status);
+ x2 = sqtd->len;
+ x_actlen += x2 - x1;
+ printf("XXX ehci_idone: x_actlen=%u, x1=%u, x2=%u\n",
x_actlen, x1, x2);
+ }
+ }
+ }
+
cerr = EHCI_QTD_GET_CERR(status);
- DPRINTFN(/*10*/2, ("ehci_idone: len=%d, actlen=%d, cerr=%d, "
+ DPRINTFN(/*10*/2, ("ehci_idone: len=%u, actlen=%u, cerr=%d, "
"status=0x%x\n", xfer->length, actlen, cerr, status));
xfer->actlen = actlen;
if ((status & EHCI_QTD_HALTED) != 0) {
@@ -912,13 +939,20 @@ ehci_idone(struct usbd_xfer *xfer)
xfer->status = USBD_STALLED;
else
xfer->status = USBD_IOERROR; /* more info XXX */
- } else
+ } else {
+ if (xfer->actlen) {
+ if (xfer->length < xfer->actlen) {
+ printf("ehci_idone: len=%u, actlen=%u, x_actlen=%u,
cerr=%d, "
+ "status=0x%x type=0x%x\n", xfer->length, actlen,
xfer->actlen, cerr,
+ status,
UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes));
+ }
+ usb_syncmem(&xfer->dmabuf, 0, xfer->actlen,
+ usbd_xfer_isread(xfer) ?
+ BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
+ }
xfer->status = USBD_NORMAL_COMPLETION;
+ }
- if (xfer->actlen)
- usb_syncmem(&xfer->dmabuf, 0, xfer->actlen,
- usbd_xfer_isread(xfer) ?
- BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
usb_transfer_complete(xfer);
DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex));
}
--
Regards,
Mikolaj