On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote:
I still don't understand the original problem, that the kernel is not
even preemptible enough for network interrupts to work (except in 5.2
where Giant breaks things). Perhaps I misread the problem, and it is
actually that networking works but userland is unable to run in time
to avoid packet loss.
The test is done with UDP packets between two servers. The em
driver is incrementing the received packet count correctly but
the packet is not making it up the network stack. If
the application was not servicing the socket fast enough I would
expect to see the "dropped due to full socket buffers" (udps_fullsock)
counter incrementing, as shown by netstat -s.
I grab a copy of netstat -s, netstat -i, and netstat -m
before and after testing. Other than the link packets counter,
I haven't seen any other indication of where the packet is getting
lost. The em driver has a debugging stats option which does not
indicate receive side overflows.
I'm fairly certain this same behavior can be seen with the fxp
driver, but I'll need to double check.
These are results I sent a few days ago after setting up a
test without an ethernet switch between the sender and receiver.
The switch was originally used to verify the sender was actually
transmitting. With spanning tree, ethernet keepalives, and CDP
(cisco proprietary neighbor protocol) disabled and static ARP entries
on the sender and receiver I can account for all packets making
it to the receiver.
##
Back to back test with no ethernet switch between two em interfaces,
same result. The receiving side has been up > 1 day and exhibits
the problem. These are also two different servers. The small
gettimeofday() syscall tester also shows the same ~30
second pattern of high latency between syscalls.
Receiver test application reports 3699 missed packets
Sender netstat -i:
(before test)
em1 1500 <Link#2> 00:04:23:cf:51:b7 20 0
15975785 0 0
em1 1500 10.1/24 10.1.0.2 37 -
15975801 - -
(after test)
em1 1500 <Link#2> 00:04:23:cf:51:b7 22 0
25975822 0 0
em1 1500 10.1/24 10.1.0.2 39 -
25975838 - -
total IP packets sent in during test = end - start
25975838-15975801 = 10000037 (expected, 1,000,000 packets test +
overhead)
Receiver netstat -i:
(before test)
em1 1500 <Link#2> 00:04:23:c4:cc:89 15975785 0
21 0 0
em1 1500 10.1/24 10.1.0.1 15969626 -
19 - -
(after test)
em1 1500 <Link#2> 00:04:23:c4:cc:89 25975822 0
23 0 0
em1 1500 10.1/24 10.1.0.1 25965964 -
21 - -
total ethernet frames received during test = end - start
25975822-15975785 = 10000037 (as expected)
total IP packets processed during test = end - start
25965964-15969626 = 9996338 (expecting 10000037)
Missed packets = expected - received
10000037-9996338 = 3699
netstat -i accounts for the 3699 missed packets also reported by the
application
Looking closer at the tester output again shows the periodic
~30 second windows of packet loss.
There's a second problem here in that packets are just disappearing
before they make it to ip_input(), or there's a dropped packets
counter I've not found yet.
I can provide remote access to anyone who wants to take a look, this
is very easy to duplicate. The ~ 1 day uptime before the behavior
surfaces is not making this easy to isolate.
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"