On Fri, Jun 09, 2017 at 07:19:34PM +0200, Bj??rn Ketelaars wrote: > On Fri 09/06/2017 12:07, Martin Pieuchot wrote: > > On 08/06/17(Thu) 20:38, Bj??rn Ketelaars wrote: > > > On Thu 08/06/2017 16:55, Martin Pieuchot wrote: > > > > On 07/06/17(Wed) 09:43, Bj??rn Ketelaars wrote: > > > > > On Sat 03/06/2017 08:44, Bj??rn Ketelaars wrote: > > > > > > > > > > > > Reverting back to the previous kernel fixed the issue above. > > > > > > Question: can > > > > > > someone give a hint on how to track this issue? > > > > > > > > > > After a bit of experimenting I'm able to reproduce the problem. > > > > > Summary is > > > > > that queueing in pf and use of a current (after May 30), multi > > > > > processor > > > > > kernel (bsd.mp from snapshots) causes these specific watchdog timeouts > > > > > followed by a system freeze. > > > > > > > > > > Issue is 'gone' when: > > > > > 1.) using an older kernel (before May 30); > > > > > 2.) removal of queueing statements from pf.conf. Included below the > > > > > specific > > > > > snippet; > > > > > 3.) switch from MP kernel to SP kernel. > > > > > > > > > > New observation is that while queueing, using a MP kernel, the > > > > > download > > > > > bandwidth is only a fraction of what is expected. Exchanging the MP > > > > > kernel > > > > > with a SP kernel restores the download bandwidth to expected level. > > > > > > > > > > I'm guessing that this issue is related to recent work on PF? > > > > > > > > It's certainly a problem in, or exposed by, re(4) with the recent MP > > > > work > > > > in the network stack. > > > > > > > > It would help if you could build a kernel with MP_LOCKDEBUG defined and > > > > see if the resulting kernel enters ddb(4) instead of freezing. > > > > > > > > Thanks, > > > > Martin > > > > > > Thanks for the hint! It helped in entering ddb. I collected a bit of > > > output, > > > which you can find below. If I read the trace correctly the crash is > > > related > > > to line 1750 of sys/dev/ic/re.c: > > > > > > d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF); > > > > Could you test the diff below, always with a MP_LOCKDEBUG kernel and > > tell us if you can reproduce the freeze or if the kernel enters ddb(4)? > > > > Another question, how often do you see "watchdog timeout" messages? > > > > Index: re.c > > =================================================================== > > RCS file: /cvs/src/sys/dev/ic/re.c,v > > retrieving revision 1.201 > > diff -u -p -r1.201 re.c > > --- re.c 24 Jan 2017 03:57:34 -0000 1.201 > > +++ re.c 9 Jun 2017 10:04:43 -0000 > > @@ -2074,9 +2074,6 @@ re_watchdog(struct ifnet *ifp) > > s = splnet(); > > printf("%s: watchdog timeout\n", sc->sc_dev.dv_xname); > > > > - re_txeof(sc); > > - re_rxeof(sc); > > - > > re_init(ifp); > > > > splx(s); > > The diff (with a MP_LOCKDEBUG kernel) resulted in similar traces as before. > ddb Output is included below. > > With your diff the number of timeout messages decreased from 9 to 2 before > entering ddb.
can you try the diff below please? Index: hfsc.c =================================================================== RCS file: /cvs/src/sys/net/hfsc.c,v retrieving revision 1.39 diff -u -p -r1.39 hfsc.c --- hfsc.c 8 May 2017 11:30:53 -0000 1.39 +++ hfsc.c 12 Jun 2017 05:08:01 -0000 @@ -817,7 +817,7 @@ hfsc_deferred(void *arg) KASSERT(HFSC_ENABLED(ifq)); if (!ifq_empty(ifq)) - (*ifp->if_qstart)(ifq); + ifq_start(ifq); hif = ifq->ifq_q;