Packet loss every 30.999 seconds

2007-12-16 Thread Mark Fullmer
While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as

Packet loss every 30.999 seconds

2007-12-16 Thread Mark Fullmer
While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as

Packet loss every 30.999 seconds

2007-12-16 Thread Mark Fullmer
While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as

Re: Packet loss every 30.999 seconds

2007-12-16 Thread Mark Fullmer
: On Mon, Dec 17, 2007 at 12:21:43AM -0500, Mark Fullmer wrote: While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly

Re: Packet loss every 30.999 seconds

2007-12-17 Thread Mark Fullmer
17, 2007 at 12:21:43AM -0500, Mark Fullmer wrote: While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network

Re: Packet loss every 30.999 seconds

2007-12-17 Thread Mark Fullmer
Thanks. Have a kernel building now. It takes about a day of uptime after reboot before I'll see the problem. -- mark On Dec 17, 2007, at 5:24 AM, David G Lawrence wrote: While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled

Re: Packet loss every 30.999 seconds

2007-12-18 Thread Mark Fullmer
A little progress. I have a machine with a KTR enabled kernel running. Another machine is running David's ffs_vfsops.c's patch. I left two other machines (GENERIC kernels) running the packet loss test overnight. At ~ 32480 seconds of uptime the problem starts. This is really close to a 16 b

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Mark Fullmer
On Dec 19, 2007, at 9:54 AM, Bruce Evans wrote: On Tue, 18 Dec 2007, Mark Fullmer wrote: A little progress. I have a machine with a KTR enabled kernel running. Another machine is running David's ffs_vfsops.c's patch. I left two other machines (GENERIC kernels) running the pac

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Mark Fullmer
Just to confirm the patch did not change the behavior. I ran with it last night and double checked this morning to make sure. It looks like if you put the check at the top of the loop and the next node is changed during msleep() SLIST_NEXT will walk into the trash. I'm in over my head here..

Re: Packet loss every 30.999 seconds

2007-12-20 Thread Mark Fullmer
Thanks, I'll test this later on today. On Dec 19, 2007, at 1:11 PM, Kostik Belousov wrote: On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: Try it with "find / -type f >/dev/null" to duplicate the problem almost instantly. I was able to verify last night that (cd /; tar -cpf

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Mark Fullmer
The uio_yield() idea did not work. Still have the same 31 second interval packet loss. Is it safe to assume the vp will be valid after a msleep() or uio_yield()? If so can we do something a little different: Currently: /* this takes too long when list is large */ MNT_VNODE_FOREACH(vp, mp

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Mark Fullmer
On Dec 22, 2007, at 12:36 AM, Kostik Belousov wrote: On Fri, Dec 21, 2007 at 10:30:51PM -0500, Mark Fullmer wrote: The uio_yield() idea did not work. Still have the same 31 second interval packet loss. What patch you have used ? This is hand applied from the diff you sent December 19

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Mark Fullmer
On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote: I still don't understand the original problem, that the kernel is not even preemptible enough for network interrupts to work (except in 5.2 where Giant breaks things). Perhaps I misread the problem, and it is actually that networking works but u

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Mark Fullmer
This appears to work. No packet loss with vfs.numvnodes at 32132, 16K PPS test with 1 million packets. I'll run some additional tests bringing vfs.numvnodes closer to kern.maxvnodes. On Dec 22, 2007, at 2:03 AM, Kostik Belousov wrote: As Bruce Evans noted, there is a vfs_msync() that do almo

Re: Packet loss every 30.999 seconds

2007-12-24 Thread Mark Fullmer
On Dec 24, 2007, at 8:19 AM, Kostik Belousov wrote: Mark, could you, please, retest the patch below in your setup ? I want to put a change or some edition of it into the 7.0 release, and we need to move fast to do this. It's building now. The testing will run overnight. Your patch to ffs_s

Re: Packet loss every 30.999 seconds

2007-12-26 Thread Mark Fullmer
On Dec 25, 2007, at 12:27 AM, Kostik Belousov wrote: What fs do you use ? If FFS, are softupdates turned on ? Please, show the total time spent in the softdepflush process. Also, try to add the FULL_PREEMPTION kernel config option and report whether it helps. FFS with soft updates on all