Hello.

Mark Huth wrote:

 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void natsemi_poll_controller(struct net_device *dev)
 {
+     struct netdev_private *np = netdev_priv(dev);
+
      disable_irq(dev->irq);
-     intr_handler(dev->irq, dev);
+
+     /*
+      * A real interrupt might have already reached us at this point
+      * but NAPI might still haven't called us back.  As the
interrupt
+      * status register is cleared by reading, we should prevent an
+      * interrupt loss in this case...
+      */
+     if (!np->intr_status)
+             intr_handler(dev->irq, dev);
+
      enable_irq(dev->irq);

Oops, I was going to recast the patch but my attention switched elsewhere for couple of days, and it "slipped" into mainline. I'm now preparing a better patch to also protect...

Is it possible for this to run at the same time as the NAPI poll?  If so
then it is possible for the netpoll poll to run between np->intr_status
being cleared and netif_rx_complete() being called.  If the hardware
asserts an interrupt at the wrong moment then this could cause the

Well, there is a whole task of analyzing the netpoll conditions under smp. There appears to me to be a race with netpoll and NAPI on another processor, given that netpoll can be called with virtually any system condition on a debug breakpoint or crash dump initiation. I'm spending some time looking into it, but don't have a smoking gun immediately. Regardless, if such a condition does exist, it is shared across many or all of the potential netpolled devices. Since that is exactly the condition the suggested patch purports to solve, it is pointless if the whole NAPI/netpoll race exists. Such a race would lead to various and imaginative failures in the system. So don't fix that problem in a particular driver. If it exists, fix it generally in the netpoll/NAPI infrastructure.

   Any takers? :-)

In any case, this is a problem independently of netpoll if the chip
shares an interrupt with anything so the interrupt handler should be
fixed to cope with this situation instead.

Yes, that would appear so. If an interrupt line is shared with this device, then the interrupt handler can be called again, even though the device's interrupts are disabled on the interface. So, in the actual interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if it's set, leave immediately, it can't be our interrupt. If it's clear, read the irq enable hardware register. If enabled, do the rest of the interrupt handler.

It seems that there's no need to read it, as it gets set to 0 "synchronously" with setting the 'hands_off' flag (except in NAPI callback)...

Since the isr is disabled only by the interrupt handler, and enabled only by the poll routine, the race on the interrupt cause register is prevented. And, as a byproduct, the netpoll race is also prevented. You could just always read the isr enable hardware register, but that means you always do an operation to the chip, which can be painfully slow.

Yeah, it seems currently unjustified. However IntrEnable would have been an ultimate criterion on taking or ignoring an interrupt otherwise...

I guess the tradeoff depends on the probability of getting the isr called when NAPI is active for the device.

   Didn't get it... :-/

If this results in netpoll not getting a packet right away, that's okay, since the netpoll users should try again.

Well, in certain stuations (like KGDBoE), netpoll callback being called *while* NAPI callback is being executed would mean a deadlock, I think (as NAPI callback will never complete)...
   BTW, it seems I've found another interrupt lossage path in the driver:

netdev_poll() -> netdev_rx() -> reset_rx()

If the netdev_rx() detects an oversized packet, it will call reset_rx() which will spin in a loop "collecting" interrupt status until it sees RxResetDone there. The issue is 'intr_status' field will get overwritten and interrupt status lost after netdev_rx() returns to netdev_poll(). How do you think, is this corner case worth fixing (by moving netdev_rx() call to the top of a do/while loop)?

Mark Huth

WBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to