RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

Williams, Mitch A Thu, 06 Oct 2016 10:59:16 -0700


> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Koehrer Mathias (ETAS/ESW5)
> Sent: Thursday, October 06, 2016 12:02 AM
> To: Julia Cartwright <ju...@ni.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirs...@intel.com>; Greg <gvrose8...@gmail.com>
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-rt-
> us...@vger.kernel.org; Sebastian Andrzej Siewior
> <sebastian.siew...@linutronix.de>
> Subject: Re: [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb
> causes huge latencies in cyclictest
> 
> Hi all,
> >
> > Although, to be clear, it isn't the fact that there exists 8 threads, it's
> that the device is
> > firing all 8 interrupts at the same time.  The time spent in hardirq
> context just waking
> > up all 8 of those threads (and the cyclictest wakeup) is enough to cause
> your
> > regression.
> >
> > netdev/igb folks-
> >
> > Under what conditions should it be expected that the i350 trigger all of
> the TxRx
> > interrupts simultaneously?  Any ideas here?


I can answer that! I wrote that code.

We trigger the interrupts once a second because MSI and MSI-X interrupts are 
NOT guaranteed to be delivered. If this happens, the queues being serviced by 
this "lost" interrupt are completely stuck.

The device automatically masks each interrupt vector after it fires, expecting 
the ISR to re-enable the vector after processing is complete. If the interrupt 
is lost, the ISR doesn't run, so the vector ends up permanently masked. At this 
point, any queues associated with that vector are stuck. The only recovery is 
through the netdev watchdog, which initiates a reset.

During development of igb, we had several platforms with chipsets that 
routinely dropped MSI messages under stress. Things would be running fine and 
then, pow, all the traffic on a queue would stop. 

So, I added code to fire each vector once per second. Just unmasking the 
interrupt isn't enough - we need to trigger the ISR to get the queues cleaned 
up so the device can work again.

Is this workaround still needed? I don't know. Modern chipsets don't break a 
sweat handling gigabit-speed traffic, and they *probably* don't drop 
interrupts. But I'd still rather have that insurance.

You could try to remove the write to the EICS registers in the watchdog task to 
see if that takes care of your problem. But I wouldn't want to remove that code 
permanently, because we have seen lost interrupts in the past.

You also could try staggering the writes so that not all vectors fire each 
second. But then you'll potentially incur a much longer delay if an interrupt 
does get lost, which means you could trigger netdev watchdog events.

-Mitch



> >
> > See the start of this thread here:
> >
> >   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56@FE-
> > MBX1012.de.bosch.com
> >
> Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of
> queues.
> I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-
> tx-0).
> However the issue remains the same.
> 
> I ran the cyclictest again:
> # cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
> (Note: When using 105us instead of 100us the long latencies seem to occur
> more often).
> 
> Here are the final lines of the kernel trace output:
>   <idle>-0       4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4
> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56
> next_prio=98
> ktimerso-46      3d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/3 prev_pid=46 prev_prio=98 prev_state=S ==>
> next_comm=swapper/3 next_pid=0 next_prio=120
> ktimerso-24      1d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/1 prev_pid=24 prev_prio=98 prev_state=S ==>
> next_comm=swapper/1 next_pid=0 next_prio=120
> ktimerso-79      6d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/6 prev_pid=79 prev_prio=98 prev_state=S ==>
> next_comm=swapper/6 next_pid=0 next_prio=120
> ktimerso-35      2d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/2 prev_pid=35 prev_prio=98 prev_state=S ==>
> next_comm=swapper/2 next_pid=0 next_prio=120
>   rcuc/5-67      5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5
> prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5
> next_pid=68 next_prio=98
>   rcuc/7-89      7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7
> prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7
> next_pid=90 next_prio=98
> ktimerso-4       0d...211 1344661650us : sched_wakeup: comm=rcu_preempt
> pid=8 prio=98 target_cpu=000
>   rcuc/4-56      4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4
> prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4
> next_pid=57 next_prio=98
> ktimerso-4       0d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/0 prev_pid=4 prev_prio=98 prev_state=S ==>
> next_comm=rcu_preempt next_pid=8 next_prio=98
> ktimerso-90      7d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/7 prev_pid=90 prev_prio=98 prev_state=S ==>
> next_comm=swapper/7 next_pid=0 next_prio=120
> ktimerso-68      5d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/5 prev_pid=68 prev_prio=98 prev_state=S ==>
> next_comm=swapper/5 next_pid=0 next_prio=120
> rcu_pree-8       0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10
> prio=120 target_cpu=000
> ktimerso-57      4d...2.. 1344661652us : sched_switch:
> prev_comm=ktimersoftd/4 prev_pid=57 prev_prio=98 prev_state=S ==>
> next_comm=swapper/4 next_pid=0 next_prio=120
> rcu_pree-8       0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt
> prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661743us : sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10
> next_prio=120
>  rcuop/0-10      0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0
> prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662053us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662054us!: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662168us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662168us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662169us+: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662255us : sched_wakeup: comm=irq/48-eth2-tx-
> pid=6310 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1344662256us : sched_wakeup: comm=irq/47-eth2-rx-
> pid=6309 prio=49 target_cpu=000
> kworker/-5       0d...2.. 1344662256us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx-
> next_pid=6310 next_prio=49
> irq/48-e-6310    0d...2.. 1344662259us : sched_switch: prev_comm=irq/48-
> eth2-tx- prev_pid=6310 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-
> rx- next_pid=6309 next_prio=49
> irq/47-e-6309    0d...2.. 1344662260us+: sched_switch: prev_comm=irq/47-
> eth2-rx- prev_pid=6309 prev_prio=49 prev_state=S ==> next_comm=kworker/0:0
> next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662300us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662300us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0.....11 1344662306us : tracing_mark_write: hit latency
> threshold (39 > 23)
> 
> Just before the long latency,  the irqs "48-eth2-tx" and "48-eth2-rx" are
> active.
> When looking at the 4th line from the bottom, the time for irq/47 is
> 1344662260us, for the next line (kworker) it is 1344662300us.
> Does this mean that the irq/47 took 40us for irq processing? Or is this a
> misinterpretation?
> 
> For more lines of the trace please see the attached trace-extract.gz.
> 
> Thanks for any feedback.
> 
> Regard
> 
> Mahias
> 
> 
> 
> 
>

RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

Reply via email to