On Mon, 26 Mar 2007, Michael Schmitz wrote:
> > > If you could remember what the nature of the race was, I could > > > perhaps find a way to prevent it. Note that the transmit function > > > now only stops the queue if the ring is full; that may be important. > > > > When I look at the 2.2 code now, I'm immediately suspicious of the > > interaction between sonic_send_packet() and sonic_interrupt(), with > > respect to lp->tx_full, lp->cur_tx, lp->dirty_tx and lp->tx_laddr[]. > > Reason being, there is no mutual exclusion to prevent interrupt > > handler from interfering with the other routine. > > That would be deadly in a race, perhaps. tx_skb[n] (the array elements themselves), tx_full and dirty_tx should all be made volatile because sonic_interrupt() writes to them. We may need a write barrier to ensure that send_packet updates cur_tx before tx_full? > > Still, it isn't clear to me how this leads to "Warning: kfree_skb > > passed an skb still on a list (from 000e4416)." I don't know how these > > lists work. Does the address 000e4416 tell you anything? > > The address doesn't tell me anything without the kernel symbol table > (from the range, it seems to be a static symbol, that's all I can tell). Well, I was hoping that perhaps you could get access to the machine. We know the approximate date of the crash (24th May 2006) we can probably find the symbol in one of the kernels installed at the time. But then again, you may reproduce the crash yourself sooner or later... > But if the interrupt handler calls kfree_skb while send_packet hasn't, > in fact, pulled the skb off the list (skb_release() ??), kfree_skb may > still find the skb being used. This is a warning only, but may be > symptomatic of a race between interrupt handler and send routine. > send_packet should perhaps protect against interruption while handling > the skb list. Also, it seems to me that if (lp->cur_tx < lp->dirty_tx + SONIC_NUM_TDS) is going to give the wrong result when cur_tx wraps around. There is a similar test in sonic_interrupt, if (lp->tx_full && dev->tbusy && dirty_tx + SONIC_NUM_TDS > lp->cur_tx + 2) that also looks broken to me... -f > Just speculating - not looking at the code right now. > > Michael > -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]