Just to confirm the patch did not change the behavior.  I ran with it
last night and double checked this morning to make sure.

It looks like if you put the check at the top of the loop and the next node
is changed during msleep() SLIST_NEXT will walk into the trash.  I'm
in over my head here....

Setting kern.maxvnodes=1000 does stop both the periodic packet loss and
the high latency syscall's, so it does look like walking this chain
without yielding the processor is part of the problem I'm seeing.

The other behavior I don't understand is why the em driver is able
to increment if_ipackets but still lose the packet.

Dumping the internal stats with dev.em.1.stats=1:

Dec 19 13:07:46 dytnq-nf1 kernel: em1: Excessive collisions = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Sequence errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Defer count = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Missed Packets = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive No Buffers = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive Length Errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Crc errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Alignment errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Collision/Carrier extension errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: RX overruns = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: watchdog timeouts = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Failed = 0

With FreeBSD 4 I was able to run a UDP data collector with rtprio set,
kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF
in the application.  If packets were dropped they would show up
with netstat -s as "dropped due to full socket buffers".

Since the packet never makes it to ip_input() I no longer have
any way to count drops.  There will always be corner cases where
interrupts are lost and drops not accounted for if the adapter
hardware can't report them, but right now I've got no way to
estimate any loss.

--
mark

On Dec 19, 2007, at 12:13 PM, David G Lawrence wrote:

Try it with "find / -type f >/dev/null" to duplicate the problem
almost
instantly.

I was able to verify last night that (cd /; tar -cpf -) > all.tar would
trigger the problem.  I'm working getting a test running with
David's ffs_sync() workaround now, adding a few counters there should
get this narrowed down a little more.

Unfortunately, the version of the patch that I sent out isn't going to help your problem. It needs to yield at the top of the loop, but vp isn't necessarily valid after the wakeup from the msleep. That's a problem that I'm having trouble figuring out a solution to - the solutions that come
to mind will all significantly increase the overhead of the loop.
   As a very inadequate work-around, you might consider lowering
kern.maxvnodes to something like 20000 - that might be low enough to
not trigger the problem, but also be high enough to not significantly
affect system I/O performance.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to