Re: re0 and re1 watchdog timeouts, and system freeze

David Gwynne Sun, 11 Jun 2017 22:11:46 -0700

On Fri, Jun 09, 2017 at 07:19:34PM +0200, Bj??rn Ketelaars wrote:
> On Fri 09/06/2017 12:07, Martin Pieuchot wrote:
> > On 08/06/17(Thu) 20:38, Bj??rn Ketelaars wrote:
> > > On Thu 08/06/2017 16:55, Martin Pieuchot wrote:
> > > > On 07/06/17(Wed) 09:43, Bj??rn Ketelaars wrote:
> > > > > On Sat 03/06/2017 08:44, Bj??rn Ketelaars wrote:
> > > > > > 
> > > > > > Reverting back to the previous kernel fixed the issue above. 
> > > > > > Question: can
> > > > > > someone give a hint on how to track this issue?
> > > > > 
> > > > > After a bit of experimenting I'm able to reproduce the problem. 
> > > > > Summary is
> > > > > that queueing in pf and use of a current (after May 30), multi 
> > > > > processor
> > > > > kernel (bsd.mp from snapshots) causes these specific watchdog timeouts
> > > > > followed by a system freeze.
> > > > > 
> > > > > Issue is 'gone' when:
> > > > > 1.) using an older kernel (before May 30);
> > > > > 2.) removal of queueing statements from pf.conf. Included below the 
> > > > > specific
> > > > >     snippet;
> > > > > 3.) switch from MP kernel to SP kernel.
> > > > > 
> > > > > New observation is that while queueing, using a MP kernel, the 
> > > > > download
> > > > > bandwidth is only a fraction of what is expected. Exchanging the MP 
> > > > > kernel
> > > > > with a SP kernel restores the download bandwidth to expected level.
> > > > > 
> > > > > I'm guessing that this issue is related to recent work on PF?
> > > > 
> > > > It's certainly a problem in, or exposed by, re(4) with the recent MP 
> > > > work
> > > > in the network stack.
> > > > 
> > > > It would help if you could build a kernel with MP_LOCKDEBUG defined and
> > > > see if the resulting kernel enters ddb(4) instead of freezing.
> > > > 
> > > > Thanks,
> > > > Martin
> > > 
> > > Thanks for the hint! It helped in entering ddb. I collected a bit of 
> > > output,
> > > which you can find below. If I read the trace correctly the crash is 
> > > related
> > > to line 1750 of sys/dev/ic/re.c:
> > > 
> > >   d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
> > 
> > Could you test the diff below, always with a MP_LOCKDEBUG kernel and
> > tell us if you can reproduce the freeze or if the kernel enters ddb(4)?
> > 
> > Another question, how often do you see "watchdog timeout" messages?
> > 
> > Index: re.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/ic/re.c,v
> > retrieving revision 1.201
> > diff -u -p -r1.201 re.c
> > --- re.c    24 Jan 2017 03:57:34 -0000      1.201
> > +++ re.c    9 Jun 2017 10:04:43 -0000
> > @@ -2074,9 +2074,6 @@ re_watchdog(struct ifnet *ifp)
> >     s = splnet();
> >     printf("%s: watchdog timeout\n", sc->sc_dev.dv_xname);
> >  
> > -   re_txeof(sc);
> > -   re_rxeof(sc);
> > -
> >     re_init(ifp);
> >  
> >     splx(s);
> 
> The diff (with a MP_LOCKDEBUG kernel) resulted in similar traces as before.
> ddb Output is included below.
> 
> With your diff the number of timeout messages decreased from 9 to 2 before
> entering ddb.


can you try the diff below please?

Index: hfsc.c
===================================================================
RCS file: /cvs/src/sys/net/hfsc.c,v
retrieving revision 1.39
diff -u -p -r1.39 hfsc.c
--- hfsc.c      8 May 2017 11:30:53 -0000       1.39
+++ hfsc.c      12 Jun 2017 05:08:01 -0000
@@ -817,7 +817,7 @@ hfsc_deferred(void *arg)
        KASSERT(HFSC_ENABLED(ifq));
 
        if (!ifq_empty(ifq))
-               (*ifp->if_qstart)(ifq);
+               ifq_start(ifq);
 
        hif = ifq->ifq_q;

Re: re0 and re1 watchdog timeouts, and system freeze

Reply via email to