The latest version of the Intel e1000 driver use an implementation which
is incompatible with netpoll when compiled as NAPI enabled.
The netpoll_poll routines uses the parent netdev structure as the device
that gets polled:
poll_napi:
if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) &&
npinfo->poll_owner != smp_processor_id() &&
spin_trylock(&npinfo->poll_lock)) {
npinfo->rx_flags |= NETPOLL_RX_DROP;
atomic_inc(&trapped);
np->dev->poll(np->dev, &budget);
However, the e1000_intr routine schedules the napi poll based on a
pseudo device created by the driver.
e1000_intr:
static irqreturn_t
e1000_intr(int irq, void *data, struct pt_regs *regs)
{
struct net_device *netdev = data;
struct e1000_adapter *adapter = netdev_priv(netdev);
...
#else /* if !CONFIG_E1000_MQ */
if (likely(netif_rx_schedule_prep(&adapter->polling_netdev[0])))
__netif_rx_schedule(&adapter->polling_netdev[0]);
else
e1000_irq_enable(adapter);
#endif /* CONFIG_E1000_MQ */
printks inserted confirm that data != &adapter->polling_netdev[0] and
the the parent netdev for the e1000 device is not on the poll list and
does not have the LINK_STATE_RX_SCHED bit set.
Hence the necessary LINK_STATE_RX_SCHED flag is not set in the parent
device's state field, and the e1000_clean routine is never called from
netpoll:napi_poll.
I haven't looked closely at the MQ code, but I doubt the situation
improves, as the whole idea with the implementation seems to be to allow
multiple pseudo devices, one for each queue.
This problem does not exist in the normal napi polling sequence, as the
poll list is walked, and whatever devices are on the list are polled,
irrespective of whether they were the actual parent devices or
driver-created pseudo devices.
The bug causes netdump to not work at all over e1000-napi, and probably
leads to stalls in the case of kgdboe or netconsole as well.
Unfortunately, it is not a simple coding error, but rather the case of
two designs being incompatible.
I would like comments on potential solutions, none of which seem
particularly appealing
1) Have netpoll:napi_poll walk the poll list like the normal dev.c code
does. This seems simplest. I am concerned that it lets too much
activity go on for kgdboe and netdump, since all devices get serviced
that are on the poll list at the time of the system freeze for debug/dump.
2) Fix the e1000 (and other drivers that may do similar things) to be
aware that it is scheduling napi-callback based on a net poll, falling
back into single queue operation off of the parent device only - simple
in concept, but possibly a good bit of change to the driver(s).
2.5) provide polled operation directly to the netif_rx path - probably
straightforward, but adds code to the driver and requires a means to
untangle any pending napi polls, which might be "interesting" in
multi-queue, smp, rt_pre-emptible situations.
3) Figure out a way for np device setup to become aware of the actual
device to be polled - perhaps an additional driver api - this may also
require some work of #2 above, but offers an advantage of keeping the
decisions out of the fast path. This requires work to several different
areas, including the netpoll setup api, the driver api, and every
driver, although the driver work should be relatively trivial except for
multiqueue devices.
4) Some brilliant 3 line change that makes everything work which I've
stupidly overlooked. (my preferred solution)
I'm willing to do the work after receiving some comments from the list.
Unfortunately, I have to fix this fairly soon, so I look forward to
comments in the next few days.
Thanks,
Mark Huth
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html