We are debugging an issue with netconsole and ixgbe, that ksoftirqd takes 100%
of a core. It happens with both current net and net-next.
To reproduce the issue:
1. Setup server with ixgbe and netconsole. We bind each queue to a separate
core via smp_affinity;
2. Start simple netperf job from client, like:
./super_netperf 201 -P 0 -t TCP_RR -p 8888 -H <SERVER> -l 7200 -- -r
300,300 -o -s 1M,1M -S 1M,1M
3. On server, write to /dev/kmsg in a loop (to send netconsole):
for x in {1..7200} ; do echo aa >> /dev/kmsg ; sleep 1; done
4. On server, monitor ksoftirqd in top
Within a few minutes, top will show one ksoftirqd take 100% of the core for many
seconds in a row.
When the ksoftirqd takes 100% of a core, the driver hits "clean_complete=false"
path below, so this napi stays in polling mode.
ixgbe_for_each_ring(ring, q_vector->rx) {
int cleaned = ixgbe_clean_rx_irq(q_vector, ring,
per_ring_budget);
work_done += cleaned;
if (cleaned >= per_ring_budget)
clean_complete = false;
}
/* If all work not completed, return budget and keep polling */
if (!clean_complete)
return budget;
We didn't see this issue on a 4.6 based kernel.
We are still debugging the issue. But we would like to check whether there is
known solution for it. Any comments and suggestions are highly appreciated.
Best,
Song