We're seeing an issue with an e1000 device. After some time (4-8 hrs
typically) it jams itself up and refuses to receive any more packets.
We're running 2.6.10, but with the e1000 driver from 2.6.14.
Outgoing packets seem to be fine, incoming packets just increment the
error count.
ethtool device specific stats show the following counters
rx_fifo_errors: 1714150
rx_no_buffer_count: 299
rx_missed_errors: 1714150
The fifo and missed errors seem to actually be counting the same thing,
the "Missed Packets Count" error register.
From the chip docs:
"Counts the number of missed packets. Packets are missed when the
receive FIFO has insufficient space to store the incoming packet. This
can be caused because of too few buffers allocated, or because there is
insufficient bandwidth on the PCI bus. Events setting this counter cause
RXO, the Receiver Overrun Interrupt, to be set. This register does not
increment if receives are not enabled."
The no buffer count is similar, based on the "Receive No Buffers Count"
register. From the docs:
"This register counts the number of times that frames were received when
there were no available buffers in host memory to store those frames
(receive descriptor head and tail pointers were equal). The packet is
still received if there is space in the FIFO. This register only
increments if receives are enabled."
On the jammed device, dumping the registers gives the following,
indicating that the head and tail pointers are equal:
Receive buffer size: 2048
0x02808: RDLEN (Receive desc length) 0x00001000
0x02810: RDH (Receive desc head) 0x00000060
0x02818: RDT (Receive desc tail) 0x00000060
0x02820: RDTR (Receive delay timer) 0x00000000
So, somehow we're getting into a state where we can't receive packets,
and we're never getting out of that state.
Anyone have any ideas?
Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html