Re: [PATCH] natsemi: netpoll fixes

Sergei Shtylyov Sat, 10 Mar 2007 12:25:25 -0800

Hello.

Mark Huth wrote:

 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void natsemi_poll_controller(struct net_device *dev)
 {
+     struct netdev_private *np = netdev_priv(dev);
+
      disable_irq(dev->irq);
-     intr_handler(dev->irq, dev);
+
+     /*
+      * A real interrupt might have already reached us at this point
+      * but NAPI might still haven't called us back.  As the
interrupt
+      * status register is cleared by reading, we should prevent an
+      * interrupt loss in this case...
+      */
+     if (!np->intr_status)
+             intr_handler(dev->irq, dev);
+
      enable_irq(dev->irq);

Oops, I was going to recast the patch but my attention switched elsewherefor couple of days, and it "slipped" into mainline. I'm now preparing a betterpatch to also protect...

Is it possible for this to run at the same time as the NAPI poll?  If so
then it is possible for the netpoll poll to run between np->intr_status
being cleared and netif_rx_complete() being called.  If the hardware
asserts an interrupt at the wrong moment then this could cause the

Well, there is a whole task of analyzing the netpoll conditions undersmp. There appears to me to be a race with netpoll and NAPI on anotherprocessor, given that netpoll can be called with virtually any systemcondition on a debug breakpoint or crash dump initiation. I'm spendingsome time looking into it, but don't have a smoking gun immediately.Regardless, if such a condition does exist, it is shared across many orall of the potential netpolled devices. Since that is exactly thecondition the suggested patch purports to solve, it is pointless if thewhole NAPI/netpoll race exists. Such a race would lead to various andimaginative failures in the system. So don't fix that problem in aparticular driver. If it exists, fix it generally in the netpoll/NAPIinfrastructure.


   Any takers? :-)

In any case, this is a problem independently of netpoll if the chip
shares an interrupt with anything so the interrupt handler should be
fixed to cope with this situation instead.

Yes, that would appear so. If an interrupt line is shared with thisdevice, then the interrupt handler can be called again, even though thedevice's interrupts are disabled on the interface. So, in the actualinterrupt handler, check the dev->state __LINK_STATE_SCHED flag - ifit's set, leave immediately, it can't be our interrupt. If it's clear,read the irq enable hardware register. If enabled, do the rest of theinterrupt handler.

It seems that there's no need to read it, as it gets set to 0"synchronously" with setting the 'hands_off' flag (except in NAPI callback)...

Since the isr is disabled only by the interrupthandler, and enabled only by the poll routine, the race on the interruptcause register is prevented. And, as a byproduct, the netpoll race isalso prevented. You could just always read the isr enable hardwareregister, but that means you always do an operation to the chip, whichcan be painfully slow.

Yeah, it seems currently unjustified. However IntrEnable would have beenan ultimate criterion on taking or ignoring an interrupt otherwise...

I guess the tradeoff depends on the probabilityof getting the isr called when NAPI is active for the device.


   Didn't get it... :-/

If this results in netpoll not getting a packet right away, that's okay,since the netpoll users should try again.

Well, in certain stuations (like KGDBoE), netpoll callback being called*while* NAPI callback is being executed would mean a deadlock, I think (asNAPI callback will never complete)...

   BTW, it seems I've found another interrupt lossage path in the driver:

netdev_poll() -> netdev_rx() -> reset_rx()

If the netdev_rx() detects an oversized packet, it will call reset_rx() whichwill spin in a loop "collecting" interrupt status until it sees RxResetDonethere. The issue is 'intr_status' field will get overwritten and interruptstatus lost after netdev_rx() returns to netdev_poll(). How do you think, isthis corner case worth fixing (by moving netdev_rx() call to the top of ado/while loop)?

Mark Huth


WBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] natsemi: netpoll fixes

Reply via email to