Re: 3c59x NIC overruns with multicast makes networking freeze

Arnaud Installe Sat, 23 Sep 2000 04:28:41 -0700
The description of the NICs in question as found in
/etc/sysconfig/hwconf
are:

3Com Corporation|3c900B-TPO [Etherlink XL TPO]
3Com Corporation|3c905C-TX [Fast Etherlink]

In one machine it's the first which is getting lots of overruns; in the
other it's the second.  Depends on which one is connected to the network
with 60% usage.
 The machines are dual-processor capable, but only a few of them have 2
processors inside.  Yes, they're using standard IDE drives.  (I've been
telling the people at my company we *need* SCSI drives forever...  :-P)
 I don't think the 3c95x driver uses promiscuous mode for multicasting, or
does it ?
 BTW I'm not sure I understand Documentation/networking/multicast.txt.  
(Is it still up-to-date ?)  I suppose IP-MRoute, and thus AllMulti's only
relevant for routers while IP-Multicast is relevant for nodes transmitting
and receiving multicast.  So that means for nodes "hardware filters really
help", so we really shouldn't be using 3c59x-drivers, right ?  Am I 
correct that Intel EtherExpress Pro 10/100 cards would be a better choice ?

On Sat, Sep 23, 2000 at 03:23:45PM +1100, Andrew Morton wrote:
> Arnaud Installe wrote:
> > 
> > Hi,
> > 
> > Has anyone else seen a lot of overruns while serving multicast on a pretty
> > loaded (60%) network, with 3c59x cards ?
> 
> This is the first I've heard of it.  Which sort of 3com card?
> 
> >  (BTW What exactly are
> > "overruns" ?  Are they buffer overflows on the NIC side or buffer
> > overflows on the kernel side, because the software can't follow, or even
> > something completely different ?)
> 
> The NIC has an 8 kbyte on-board FIFO, some of which (3-5k) is used for
> buffering incoming data.
> 
> An Rx overrun occurs when that on-board FIFO has filled up, and more
> data needs to go into it.  It shouldn't fill up.  This can be caused by
> excessive PCI traffic, or by the software not being able to feed the NIC
> new Rx buffers fast enough, or by the driver leaving the Rx engine in
> the stalled state for too long.

Do I understand correctly that those Rx buffers are DMA buffers provided
by the kernel ?

> You should try the 2.2.17 driver.  There's a copy at
> http://www.uow.edu.au/~andrewm/linux/3c59x-2.2.17+.gz

So it's included in linux-2.2.17.tar.gz, is it ?  Is it also included in
any of the 2.4.0-test kernels ?

> If it still happens (and it probably will) there are some things you can
> look at:
> 
> - Get a faster computer!  How quick is it?

Pentium III 450MHz dual-processor.  I thought there might be some locking
problem with the SMP code, so I installed a single-processor kernel.  
Problem didn't go away though.

> - Change the driver by doubling the value of RX_RING_SIZE.  Keep it a
> power of two.  This is a bit desperate, but it will give you some more
> elasticity.

What's the RX_RING_SIZE used for ?

> - Make sure the NIC isn't sharing an interrupt with another device. 
> Look at /proc/interrupts and if it's shared, have a poke in the BIOS.

Both NICs have their own IRQ.

> - It's also possible that some other device exists on your system which
> has a slow interrupt service routine, and is interrupting the 3c59x ISR
> for a long time while the Rx engine is stalled.  This is a pretty
> unlikely scenario.  Disabling interrupts around the call to
> boomerang_rx() would be slightly interesting.

I suppose IDE drives have slow ISRs ?  Does unmasking the IRQs for hd's
work reliably with most of today's IDE drives and controllers ?  The IDE
controller is an "Intel Corporation|82371AB PIIX4 IDE".

> >  After a while the network freezes.  I can't ping nor ssh to the machine.
> > Restarting the network scripts solves the problem.
> 
> There was a bug in the driver which was fixed 6-8 weeks ago.  If the
> driver's Rx path completely runs out of memory and cannot allocate a new
> buffer 16 times in a row, the receive path gets stuck and the interface
> needs to be taken down.  Moving to the 2.2.17 driver should eliminate
> this possibility.

Hmmm...  I suppose when this happens it's normal a lot of overruns occur ?
Upgrading the driver will make the networking stable, but I suppose that
won't prevent a lot of multicast UDP packets from getting dropped, or will
it ?  (Though unmasking the hd IRQ might.)

> You should also try
> 
>       echo "512 1024 1536" > /proc/sys/vm/freepages
> 
> to double the amount of memory which is reserved for drivers and such. 
> This change put a big smile on the face of a 2.2.12 user earlier this
> week.  (He had 512 megs and a reasonable, but heavy workload.  I think
> we're setting this too low).

Ok.

> Please let me know the outcome.

I sure will.  Thanks,

                                                        Arnaud

-- 
Arnaud Installe                                             [EMAIL PROTECTED]

Administration:  An ingenious abstraction in politics, designed to receive
the kicks and cuffs due to the premier or president.
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
Re: 3c59x NIC overruns with multicast makes networking freeze

Reply via email to