Andrew Morton wrote:
> 
> Manfred wrote:
> >
> > Hi Jeff, Tjeerd,
> >
> > I spotted the spin_lock in natsemi.c, and I think it's bogus.
> >
> > The "simultaneous interrupt entry" is a bug in some 2.0 and 2.1 kernel
> > (even Alan didn't remember it exactly when I asked him), thus a sane
> > driver can assume that an interrupt handler is never reentered.
> >
> > Donald often uses dev->interrupt to hide other races, but I don't see
> > anything in this driver (tx_timeout and netdev_timer are both trivial)
> 
> Hi, Manfed.
> 
> I think you're right.  2.4's interrupt handling prevents
> simultaneous entry of the same ISR.
> 
> However, natsemi.c's spinlock needs to be retained, and
> extended into start_tx(), because this driver has
> a race which has cropped up in a few others:
> 
> Current code:
> 
> start_tx()
> {
>         ...
>         if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
>         /* WINDOW HERE */
>                 np->tx_full = 1;
>                 netif_stop_queue(dev);
>         }
>         ...
> }
> 
> If the ring is currently full and an interrupt comes in
> at the indicated window and reaps ALL the packets in the
> ring, the driver ends up in state `tx_full = 1' and tramsmit
> disabled, but with no outstanding transmit interrupts.
> 
> It's screwed.  You need another interrupt so tx_full
> can be cleared and the queue can be restarted, but you can't
> *get* another interrupt because there are no Tx packets outstanding.
> 
> It's very unlikely to happen with this particular driver
> because it's also polling the transmit queue within
> receive interrupts.  Receiving a packet will clear
> the condition.
> 
> If you were madly hosing out UDP packets and receiving nothing
> then this could occur.  It was certainly triggerable in 3c59x.c,
> which doesn't test the Tx queue state in Rx interrupts.
> 
> I currently have natsemi.c lying in pieces on my garage floor,
> so I'll put this locking in if it's OK with everyone?

(entire message quoted, as it was from 12/23/2000)

Attached is a patch against 2.4.1-pre9, which includes the changes I
would prefer.  Comments?

The Tx locking is a bit conservative -- I think Donald suggested it
could be removed completely -- but I would prefer to have something I am
100% certain will work, and then test the driver without locking under
stress conditions to make sure no race or other bug exists.

        Jeff


-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie
Index: linux_2_4/drivers/net/natsemi.c
diff -u linux_2_4/drivers/net/natsemi.c:1.1.1.6 
linux_2_4/drivers/net/natsemi.c:1.1.1.6.24.1
--- linux_2_4/drivers/net/natsemi.c:1.1.1.6     Mon Dec 11 19:23:42 2000
+++ linux_2_4/drivers/net/natsemi.c     Sun Jan 21 06:39:02 2001
@@ -26,6 +26,11 @@
                - Bug fixes and better intr performance (Tjeerd)
        Version 1.0.2:
                - Now reads correct MAC address from eeprom
+       Version 1.0.3:
+               - Eliminate redundant priv->tx_full flag
+               - Call netif_start_queue from dev->tx_timeout
+               - wmb() in start_tx() to flush data
+               - Update Tx locking
 
 */
 
@@ -35,7 +40,7 @@
 static const char version2[] =
 "  http://www.scyld.com/network/natsemi.html\n";
 static const char version3[] =
-"  (unofficial 2.4.x kernel port, version 1.0.2, October 6, 2000 Jeff Garzik, Tjeerd 
Mulder)\n";
+"  (unofficial 2.4.x kernel port, version 1.0.3, January 21, 2001 Jeff Garzik, Tjeerd 
+Mulder)\n";
 /* Updated to recommendations in pci-skeleton v2.03. */
 
 /* Automatically extracted configuration info:
@@ -187,13 +192,14 @@
 
 The send packet thread has partial control over the Tx ring and 'dev->tbusy'
 flag.  It sets the tbusy flag whenever it's queuing a Tx packet. If the next
-queue slot is empty, it clears the tbusy flag when finished otherwise it sets
-the 'lp->tx_full' flag.
+queue slot is empty, it clears the tbusy flag when finished.  Under 2.4, the
+"tbusy flag" is now controlled by netif_{start,stop,wake}_queue() and tested
+by netif_queue_stopped().
 
 The interrupt handler has exclusive control over the Rx ring and records stats
 from the Tx ring.  After reaping the stats, it marks the Tx queue entry as
-empty by incrementing the dirty_tx mark. Iff the 'lp->tx_full' flag is set, it
-clears both the tx_full and tbusy flags.
+empty by incrementing the dirty_tx mark. Iff Tx queueing is stopped and Tx
+entries were reaped, the Tx queue is started and scheduled.
 
 IV. Notes
 
@@ -319,7 +325,6 @@
        unsigned int cur_rx, dirty_rx;          /* Producer/consumer ring indices */
        unsigned int cur_tx, dirty_tx;
        unsigned int rx_buf_sz;                         /* Based on MTU+slack. */
-       unsigned int tx_full:1;                         /* The Tx queue is full. */
        /* These values are keep track of the transceiver/media in use. */
        unsigned int full_duplex:1;                     /* Full-duplex operation 
requested. */
        unsigned int duplex_lock:1;
@@ -697,7 +702,7 @@
 
        dev->trans_start = jiffies;
        np->stats.tx_errors++;
-       return;
+       netif_start_queue(dev);
 }
 
 
@@ -707,7 +712,6 @@
        struct netdev_private *np = (struct netdev_private *)dev->priv;
        int i;
 
-       np->tx_full = 0;
        np->cur_rx = np->cur_tx = 0;
        np->dirty_rx = np->dirty_tx = 0;
 
@@ -763,11 +767,13 @@
        np->cur_tx++;
 
        /* StrongARM: Explicitly cache flush np->tx_ring and skb->data,skb->len. */
+       wmb();
 
-       if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
-               np->tx_full = 1;
+       spin_lock_irq(&np->lock);
+       if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1)
                netif_stop_queue(dev);
-       }
+       spin_unlock_irq(&np->lock);
+
        /* Wake the potentially-idle transmit channel. */
        writel(TxOn, dev->base_addr + ChipCmd);
 
@@ -798,9 +804,7 @@
 #endif
 
        ioaddr = dev->base_addr;
-       np = (struct netdev_private *)dev->priv;
-
-       spin_lock(&np->lock);
+       np = dev->priv;
 
        do {
                u32 intr_status = readl(ioaddr + IntrStatus);
@@ -818,6 +822,8 @@
                if (intr_status & (IntrRxDone | IntrRxErr | IntrRxIdle | 
IntrRxOverrun))
                        netdev_rx(dev);
 
+               spin_lock(&np->lock);
+
                for (; np->cur_tx - np->dirty_tx > 0; np->dirty_tx++) {
                        int entry = np->dirty_tx % TX_RING_SIZE;
                        if (np->tx_ring[entry].cmd_status & cpu_to_le32(DescOwn))
@@ -839,13 +845,14 @@
                        dev_kfree_skb_irq(np->tx_skbuff[entry]);
                        np->tx_skbuff[entry] = 0;
                }
-               if (np->tx_full
+               if (netif_queue_stopped(dev)
                        && np->cur_tx - np->dirty_tx < TX_QUEUE_LEN - 4) {
                        /* The ring is no longer full, wake queue. */
-                       np->tx_full = 0;
                        netif_wake_queue(dev);
                }
 
+               spin_unlock(&np->lock);
+
                /* Abnormal error summary/uncommon events handlers. */
                if (intr_status & IntrAbnormalSummary)
                        netdev_error(dev, intr_status);
@@ -873,8 +880,6 @@
                }
        }
 #endif
-
-       spin_unlock(&np->lock);
 }
 
 /* This routine is logically part of the interrupt handler, but separated

Reply via email to