Hi,
We have stumbled on a little problem with the tg3 driver. It would appear
that some RX ring entries without buffers are still used for receive when
memory allocation failures occur during initial RX ring setup, which
causes an oops.
The tg3 driver sets tp->rx_jumbo_ptr and MAILBOX_RCV_JUMBO_PROD_IDX
register to the full configured desired ring size (tp->rx_jumbo_pending),
even when it fails to initialize all skbs for that ring due to having run
out of memory. It does the same for the standard RX ring. tg3_init_rings()
calls tg3_alloc_rx_skb() repeatedly in a for loop to allocate an skb for
each ring entry. If an allocation fails, it simply breaks out of the loop
and returns silently. tg3_reset_hw() programs MAILBOX_RCV_STD_PROD_IDX and
MAILBOX_RCV_JUMBO_PROD_IDX to indicate the desired ring size, regardless
of whether or not all allocations really did succeed. It would appear that
when the NIC eventually receives a packet for the RX entry without a
buffer, it still signals a receive event, and the tg3 driver oopSes when
it finds no skb for that entry: NULL pointer dereference.
The EIP appears to point to a call to skb_put() in tgc_rx(). skb_put()
calls SKB_LINEAR_ASSERT which calls skb_is_nonlinear, which dereferences
skb->data_len but found skb to be null.
We have encountered the problem a few times when switching from standard
to jumbo frames on a system that had been extensively used for about
5days. Memory was probably very fragmented, and the multi-page jumbo
buffers couldn't all be allocated. I imagine hitting this problem with the
standard ring is pretty unlikely. OTOH the allocations are done with
GFP_ATOMIC, which makes failures plausible for jumbo buffers.
We saw the problem on kernel 2.4.27, but from the code I believe that
2.6.x and later 2.4.x kernels are also affected.
This bug is pretty hard to reproduce. I have managed to reproduce and
confirm the bug by hacking tg3_init_rings(): have the for loop for jumbo
sk_buffs exit after 100 iterations, instead of the desired 255 entries.
The first few jumbo packets are received properly, and eventually an oops
occurs.
A nasty aspect of the bug is that no error is returned during
initialization of the NIC and, assuming the problem is with the jumbo
ring, the NIC works normally with small packets. The OOPS will only occur
later, after having received one or a few jumbo packets, which made it
hard to figure that the problem was with the initialization.
A possible fix that I am using now is to note the actual number of
allocated buffers and tell the hardware the correct number. And tell the
user. Patch below.
Some remaining issues:
-I believe the RX ring cannot grow back later if memory becomes available.
-IWBN to use GFP_KERNEL for initialization, if the locking can be
rearranged.
-Performance will be degraded, reception may not work at all if 0 buffers
were allocated. Or there may be the confusing case for the user that the
standard RX ring entries were allocated but not the jumbo, so that some
packets are received and others not. It might make sense to fail with an
error, at least when 0 buffers were allocated.
-tg3_change_mtu() would then need to handle failures.
Something like this patch would at least avoid the nasty oops.
Against 2.6.17.
Thank you
Signed-off-by: Stephane Doyon <[EMAIL PROTECTED]>
--- linux/drivers/net/tg3.c.orig 2006-07-12 12:20:33.000000000 -0400
+++ linux/drivers/net/tg3.c 2006-07-12 12:40:53.000000000 -0400
@@ -4143,7 +4143,8 @@ static void tg3_free_rings(struct tg3 *t
* end up in the driver. tp->{tx,}lock are held and thus
* we may not sleep.
*/
-static void tg3_init_rings(struct tg3 *tp)
+static void tg3_init_rings(struct tg3 *tp,
+ u32 *actual_rx_std_pending, u32 *actual_rx_jumbo_pending)
{
u32 i;
@@ -4196,6 +4197,7 @@ static void tg3_init_rings(struct tg3 *t
-1, i) < 0)
break;
}
+ *actual_rx_std_pending = i;
if (tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) {
for (i = 0; i < tp->rx_jumbo_pending; i++) {
@@ -4203,7 +4205,8 @@ static void tg3_init_rings(struct tg3 *t
-1, i) < 0)
break;
}
- }
+ *actual_rx_jumbo_pending = i;
+ }else *actual_rx_jumbo_pending = 0;
}
/*
@@ -5805,6 +5808,7 @@ static int tg3_reset_hw(struct tg3 *tp,
{
u32 val, rdmac_mode;
int i, err, limit;
+ u32 actual_rx_std_pending, actual_rx_jumbo_pending;
tg3_disable_ints(tp);
@@ -5853,7 +5857,17 @@ static int tg3_reset_hw(struct tg3 *tp,
* can only do this after the hardware has been
* successfully reset.
*/
- tg3_init_rings(tp);
+ tg3_init_rings(tp, &actual_rx_std_pending, &actual_rx_jumbo_pending);
+ if(actual_rx_std_pending < tp->rx_pending
+ || actual_rx_jumbo_pending < tp->rx_jumbo_pending) {
+ printk(KERN_ERR PFX
+ "%s: Failed to allocate memory for complete RX rings, "
+ "std %u/%u jumbo %u/%u.\n",
+ tp->dev->name,
+ actual_rx_std_pending, tp->rx_pending,
+ actual_rx_jumbo_pending, tp->rx_jumbo_pending);
+ /* return -ENOMEM; ?? */
+ }
/* This value is determined during the probe time DMA
* engine test, tg3_test_dma.
@@ -5946,7 +5960,7 @@ static int tg3_reset_hw(struct tg3 *tp,
}
/* Setup replenish threshold. */
- tw32(RCVBDI_STD_THRESH, tp->rx_pending / 8);
+ tw32(RCVBDI_STD_THRESH, actual_rx_std_pending / 8);
/* Initialize TG3_BDINFO's at:
* RCVDBDI_STD_BD: standard eth size rx ring
@@ -5986,7 +6000,7 @@ static int tg3_reset_hw(struct tg3 *tp,
BDINFO_FLAGS_DISABLED);
/* Setup replenish threshold. */
- tw32(RCVBDI_JUMBO_THRESH, tp->rx_jumbo_pending / 8);
+ tw32(RCVBDI_JUMBO_THRESH, actual_rx_jumbo_pending / 8);
if (tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) {
tw32(RCVDBDI_JUMBO_BD + TG3_BDINFO_HOST_ADDR +
TG3_64BIT_REG_HIGH,
@@ -6045,12 +6059,12 @@ static int tg3_reset_hw(struct tg3 *tp,
BDINFO_FLAGS_MAXLEN_SHIFT),
0);
- tp->rx_std_ptr = tp->rx_pending;
+ tp->rx_std_ptr = actual_rx_std_pending;
tw32_rx_mbox(MAILBOX_RCV_STD_PROD_IDX + TG3_64BIT_REG_LOW,
tp->rx_std_ptr);
tp->rx_jumbo_ptr = (tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) ?
- tp->rx_jumbo_pending : 0;
+ actual_rx_jumbo_pending : 0;
tw32_rx_mbox(MAILBOX_RCV_JUMBO_PROD_IDX + TG3_64BIT_REG_LOW,
tp->rx_jumbo_ptr);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html