David Russell <da...@aprsworld.com> : > When connected directly to another system (not via a switch) > eventually a condition where a NULL pointer dereference occurs in > enc28j60_hw_tx() and this patch simply checks for that condition and > returns gracefully without causing a kernel panic. I believe, but > have not investigated this is caused by a packet collision and am not > sure if the kernel tracks collisions or counts them as errors, so that > should probably be added if this is what's happening. I'm also not > familiar with the linux kernel, so may have fixed this in a less than > ideal way.
Is it possible for EIR.EIR_TXERIF and EIR.EIR_TXIF to be set for the same packet ? If so the driver is intrinsically racy: - EIR.EIR_TXIF completes transmission, clears tx_skb and enables queueing again (see netif_wake_queue in enc28j60_tx_clear) - insert start_xmit here: tx_skb is set and enc28j60_hw_tx is scheduled for late execution (user context work) - EIR.EIR_EIR.EIR_TXERIF issues same enc28j60_tx_clear and clears tx_skb - enc28j60_hw_tx is run but tx_skb is NULL > diff --git a/drivers/net/ethernet/microchip/enc28j60.c > b/drivers/net/ethernet/microchip/enc28j60.c > index 86ea17e..36ac65f 100644 > --- a/drivers/net/ethernet/microchip/enc28j60.c > +++ b/drivers/net/ethernet/microchip/enc28j60.c > @@ -1233,6 +1233,9 @@ static void enc28j60_irq_work_handler(struct > work_struct *work) > */ > static void enc28j60_hw_tx(struct enc28j60_net *priv) > { > + if (!priv->tx_skb) > + return; > + > if (netif_msg_tx_queued(priv)) > printk(KERN_DEBUG DRV_NAME > ": Tx Packet Len:%d\n", priv->tx_skb->len); enc28j60_hw_tx isn't the culprit. It's the victim. This change silences the bug but it does not fix it at all. -- Ueimor