Hi Hillf, Unfortunately, above mem barriers don't help. The issue shows up within 1 minute ...
Hillf Danton <hdan...@sina.com> 于2020年8月27日周四 下午8:58写道: > > > On Thu, 27 Aug 2020 14:56:31 +0800 Kehuan Feng wrote: > > > > > Lets see if TCQ_F_NOLOC is making fq_codel different in your testing. > > > > I assume you meant disabling NOLOCK for pfifo_fast. > > > > Here is the modification, > > > > --- ./net/sched/sch_generic.c.orig 2020-08-24 22:02:04.589830751 +0800 > > +++ ./net/sched/sch_generic.c 2020-08-27 10:17:10.148977195 +0800 > > @@ -792,7 +792,7 @@ > > .dump =3D pfifo_fast_dump, > > .change_tx_queue_len =3D pfifo_fast_change_tx_queue_len, > > .owner =3D THIS_MODULE, > > - .static_flags =3D TCQ_F_NOLOCK | TCQ_F_CPUSTATS, > > + .static_flags =3D TCQ_F_CPUSTATS, > > > > The issue never happen again with it for over 3 hours stressing. And I > > restarted the test for two times. No any surprising. Quite stable... > > Jaw off. That is great news and I'm failing again to explain the test > result wrt the difference TCQ_F_NOLOCK can make in running qdisc. > > Nothing comes into mind other than two mem barriers though only one is > needed... > > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3040,6 +3040,7 @@ static void __netif_reschedule(struct Qd > > void __netif_schedule(struct Qdisc *q) > { > + smp_mb__before_atomic(); > if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state)) > __netif_reschedule(q); > } > @@ -4899,6 +4900,7 @@ static __latent_entropy void net_tx_acti > */ > smp_mb__before_atomic(); > clear_bit(__QDISC_STATE_SCHED, &q->state); > + smp_mb__after_atomic(); > qdisc_run(q); > if (root_lock) > spin_unlock(root_lock); >