Re: Packet loss every 30.999 seconds

2007-12-27 Thread Bruce Evans
On Fri, 28 Dec 2007, Bruce Evans wrote: On Fri, 28 Dec 2007, Bruce Evans wrote: In previous mail, you (Mark) wrote: # With FreeBSD 4 I was able to run a UDP data collector with rtprio set, # kern.ipc.maxsockbuf=2048, then use setsockopt() with SO_RCVBUF # in the application. If packets w

Re: Packet loss every 30.999 seconds

2007-12-27 Thread Bruce Evans
On Fri, 28 Dec 2007, Bruce Evans wrote: In previous mail, you (Mark) wrote: # With FreeBSD 4 I was able to run a UDP data collector with rtprio set, # kern.ipc.maxsockbuf=2048, then use setsockopt() with SO_RCVBUF # in the application. If packets were dropped they would show up # with nets

Re: Packet loss every 30.999 seconds

2007-12-27 Thread Bruce Evans
On Sat, 22 Dec 2007, Mark Fullmer wrote: On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote: I still don't understand the original problem, that the kernel is not even preemptible enough for network interrupts to work (except in 5.2 where Giant breaks things). Perhaps I misread the problem, and

Re: Packet loss every 30.999 seconds

2007-12-26 Thread Kris Kennaway
Mark Fullmer wrote: On Dec 25, 2007, at 12:27 AM, Kostik Belousov wrote: What fs do you use ? If FFS, are softupdates turned on ? Please, show the total time spent in the softdepflush process. Also, try to add the FULL_PREEMPTION kernel config option and report whether it helps. FFS with s

Re: Packet loss every 30.999 seconds

2007-12-26 Thread Mark Fullmer
On Dec 25, 2007, at 12:27 AM, Kostik Belousov wrote: What fs do you use ? If FFS, are softupdates turned on ? Please, show the total time spent in the softdepflush process. Also, try to add the FULL_PREEMPTION kernel config option and report whether it helps. FFS with soft updates on all

Re: Packet loss every 30.999 seconds

2007-12-24 Thread Kostik Belousov
On Mon, Dec 24, 2007 at 08:16:50PM -0500, Mark Fullmer wrote: > > On Dec 24, 2007, at 8:19 AM, Kostik Belousov wrote: > > > > >Mark, could you, please, retest the patch below in your setup ? > >I want to put a change or some edition of it into the 7.0 release, and > >we need to move fast to do th

Re: Packet loss every 30.999 seconds

2007-12-24 Thread Mark Fullmer
On Dec 24, 2007, at 8:19 AM, Kostik Belousov wrote: Mark, could you, please, retest the patch below in your setup ? I want to put a change or some edition of it into the 7.0 release, and we need to move fast to do this. It's building now. The testing will run overnight. Your patch to ffs_s

Re: Packet loss every 30.999 seconds

2007-12-24 Thread Bruce Evans
On Mon, 24 Dec 2007, Kostik Belousov wrote: On Sun, Dec 23, 2007 at 10:20:31AM +1100, Bruce Evans wrote: On Sat, 22 Dec 2007, Kostik Belousov wrote: Ok, since you talked about this first :). I already made the following patch, but did not published it since I still did not inspected all caller

Re: Packet loss every 30.999 seconds

2007-12-24 Thread Kostik Belousov
On Sun, Dec 23, 2007 at 10:20:31AM +1100, Bruce Evans wrote: > On Sat, 22 Dec 2007, Kostik Belousov wrote: > >Ok, since you talked about this first :). I already made the following > >patch, but did not published it since I still did not inspected all > >callers of MNT_VNODE_FOREACH() for safety of

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Alfred Perlstein
* David G Lawrence <[EMAIL PROTECTED]> [071221 23:31] wrote: > > > > Can you use a placeholder vnode as a place to restart the scan? > > > > you might have to mark it special so that other threads/things > > > > (getnewvnode()?) don't molest it, but it can provide for a convenient > > > > restart p

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Bruce Evans
On Sat, 22 Dec 2007, Kostik Belousov wrote: On Sun, Dec 23, 2007 at 04:08:09AM +1100, Bruce Evans wrote: On Sat, 22 Dec 2007, Kostik Belousov wrote: Yes, rewriting the syncer is the right solution. It probably cannot be done quickly enough. If the yield workaround provide mitigation for now, i

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Kostik Belousov
On Sun, Dec 23, 2007 at 04:08:09AM +1100, Bruce Evans wrote: > On Sat, 22 Dec 2007, Kostik Belousov wrote: > >Yes, rewriting the syncer is the right solution. It probably cannot be done > >quickly enough. If the yield workaround provide mitigation for now, it > >shall go in. > > I don't think rewr

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Mark Fullmer
This appears to work. No packet loss with vfs.numvnodes at 32132, 16K PPS test with 1 million packets. I'll run some additional tests bringing vfs.numvnodes closer to kern.maxvnodes. On Dec 22, 2007, at 2:03 AM, Kostik Belousov wrote: As Bruce Evans noted, there is a vfs_msync() that do almo

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Mark Fullmer
On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote: I still don't understand the original problem, that the kernel is not even preemptible enough for network interrupts to work (except in 5.2 where Giant breaks things). Perhaps I misread the problem, and it is actually that networking works but u

Re: Packet loss every 30.999 seconds

2007-12-22 Thread Bruce Evans
On Sat, 22 Dec 2007, Kostik Belousov wrote: On Fri, Dec 21, 2007 at 05:43:09PM -0800, David Schwartz wrote: I'm just an observer, and I may be confused, but it seems to me that this is motion in the wrong direction (at least, it's not going to fix the actual problem). As I understand the probl

Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> > > Can you use a placeholder vnode as a place to restart the scan? > > > you might have to mark it special so that other threads/things > > > (getnewvnode()?) don't molest it, but it can provide for a convenient > > > restart point. > > > >That was one of the solutions that I considered and

Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> As Bruce Evans noted, there is a vfs_msync() that do almost the same > traversal of the vnodes. It was missed in the previous patch. Try this one. I forgot to comment on that when Bruce pointed that out. My solution has been to comment out the call to vfs_msync. :-) It comes into play when yo

Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> I'm just an observer, and I may be confused, but it seems to me that this is > motion in the wrong direction (at least, it's not going to fix the actual > problem). As I understand the problem, once you reach a certain point, the > system slows down *every* 30.999 seconds. Now, it's possible for

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Kostik Belousov
On Sat, Dec 22, 2007 at 01:28:31AM -0500, Mark Fullmer wrote: > > On Dec 22, 2007, at 12:36 AM, Kostik Belousov wrote: > >Lets check whether the syncer is the culprit for you. > >Please, change the value of the syncdelay at the sys/kern/vfs_subr.c > >around the line 238 from 30 to some other value

Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> >What patch you have used ? > > This is hand applied from the diff you sent December 19, 2007 1:24:48 > PM EST Mark, try the previos patch from Kostik - the one that does the one tick msleep. I think you'll find that that one does work. The likely problem with the second version is that ui

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Mark Fullmer
On Dec 22, 2007, at 12:36 AM, Kostik Belousov wrote: On Fri, Dec 21, 2007 at 10:30:51PM -0500, Mark Fullmer wrote: The uio_yield() idea did not work. Still have the same 31 second interval packet loss. What patch you have used ? This is hand applied from the diff you sent December 19, 2007

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Kostik Belousov
On Fri, Dec 21, 2007 at 04:24:32PM -0800, Alfred Perlstein wrote: > * David G Lawrence <[EMAIL PROTECTED]> [071221 15:42] wrote: > > > >Unfortunately, the version of the patch that I sent out isn't going > > > > to > > > > help your problem. It needs to yield at the top of the loop, but vp >

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Kostik Belousov
On Fri, Dec 21, 2007 at 10:30:51PM -0500, Mark Fullmer wrote: > The uio_yield() idea did not work. Still have the same 31 second > interval packet loss. What patch you have used ? Lets check whether the syncer is the culprit for you. Please, change the value of the syncdelay at the sys/kern/vfs

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Kostik Belousov
On Fri, Dec 21, 2007 at 05:43:09PM -0800, David Schwartz wrote: > > > I'm just an observer, and I may be confused, but it seems to me that this is > motion in the wrong direction (at least, it's not going to fix the actual > problem). As I understand the problem, once you reach a certain point, t

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Mark Fullmer
The uio_yield() idea did not work. Still have the same 31 second interval packet loss. Is it safe to assume the vp will be valid after a msleep() or uio_yield()? If so can we do something a little different: Currently: /* this takes too long when list is large */ MNT_VNODE_FOREACH(vp, mp

RE: Packet loss every 30.999 seconds

2007-12-21 Thread David Schwartz
I'm just an observer, and I may be confused, but it seems to me that this is motion in the wrong direction (at least, it's not going to fix the actual problem). As I understand the problem, once you reach a certain point, the system slows down *every* 30.999 seconds. Now, it's possible for the co

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Alfred Perlstein
* David G Lawrence <[EMAIL PROTECTED]> [071221 15:42] wrote: > > >Unfortunately, the version of the patch that I sent out isn't going to > > > help your problem. It needs to yield at the top of the loop, but vp isn't > > > necessarily valid after the wakeup from the msleep. That's a problem tha

Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> >Unfortunately, the version of the patch that I sent out isn't going to > > help your problem. It needs to yield at the top of the loop, but vp isn't > > necessarily valid after the wakeup from the msleep. That's a problem that > > I'm having trouble figuring out a solution to - the solutions

Re: Packet loss every 30.999 seconds

2007-12-21 Thread Alfred Perlstein
* David G Lawrence <[EMAIL PROTECTED]> [071219 09:12] wrote: > > >Try it with "find / -type f >/dev/null" to duplicate the problem > > >almost > > >instantly. > > > > I was able to verify last night that (cd /; tar -cpf -) > all.tar would > > trigger the problem. I'm working getting a test runn

Re: Packet loss every 30.999 seconds

2007-12-20 Thread Mark Fullmer
Thanks, I'll test this later on today. On Dec 19, 2007, at 1:11 PM, Kostik Belousov wrote: On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: Try it with "find / -type f >/dev/null" to duplicate the problem almost instantly. I was able to verify last night that (cd /; tar -cpf

Re: Packet loss every 30.999 seconds

2007-12-20 Thread Peter Jeremy
On Wed, Dec 19, 2007 at 12:06:59PM -0500, Mark Fullmer wrote: >Thanks for the other info on timer resolution, I overlooked >clock_gettime(). If you have a UP system with a usable TSC (or equivalent) then using rdtsc() (or equivalent) is a much cheaper way to measure short durations with high resol

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Kostik Belousov
On Wed, Dec 19, 2007 at 11:44:00AM -0800, Julian Elischer wrote: > David G Lawrence wrote: > >>> In any case, it appears that my patch is a no-op, at least for the > >>>problem I was trying to solve. This has me confused, however, because at > >>>one point the problem was mitigated with it. The pat

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Julian Elischer
David G Lawrence wrote: In any case, it appears that my patch is a no-op, at least for the problem I was trying to solve. This has me confused, however, because at one point the problem was mitigated with it. The patch has gone through several iterations, however, and it could be that it was mad

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Bruce Evans
On Wed, 19 Dec 2007, David G Lawrence wrote: The patch should work fine. IIRC, it yields voluntarily so that other things can run. I committed a similar hack for uiomove(). It was It patches the bottom of the loop, which is only reached if the vnode is dirty. So it will only help if there

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Mark Fullmer
Just to confirm the patch did not change the behavior. I ran with it last night and double checked this morning to make sure. It looks like if you put the check at the top of the loop and the next node is changed during msleep() SLIST_NEXT will walk into the trash. I'm in over my head here..

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Kostik Belousov
On Wed, Dec 19, 2007 at 08:11:59PM +0200, Kostik Belousov wrote: > On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: > > > >Try it with "find / -type f >/dev/null" to duplicate the problem > > > >almost > > > >instantly. > > > > > > I was able to verify last night that (cd /; tar

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Kostik Belousov
On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: > > >Try it with "find / -type f >/dev/null" to duplicate the problem > > >almost > > >instantly. > > > > I was able to verify last night that (cd /; tar -cpf -) > all.tar would > > trigger the problem. I'm working getting a test

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Bruce Evans
On Thu, 20 Dec 2007, Bruce Evans wrote: On Wed, 19 Dec 2007, David G Lawrence wrote: Considering that the CPU clock cycle time is on the order of 300ps, I would say 125ns to do a few checks is pathetic. As I said, 125 nsec is a short time in this context. It is approximately the time for a

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Bruce Evans
On Wed, 19 Dec 2007, David G Lawrence wrote: Try it with "find / -type f >/dev/null" to duplicate the problem almost instantly. FreeBSD used to have some code that would cause vnodes with no cached pages to be recycled quickly (which would have made a simple find ineffective without reading

Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> >Try it with "find / -type f >/dev/null" to duplicate the problem > >almost > >instantly. > > I was able to verify last night that (cd /; tar -cpf -) > all.tar would > trigger the problem. I'm working getting a test running with > David's ffs_sync() workaround now, adding a few counters there

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Mark Fullmer
On Dec 19, 2007, at 9:54 AM, Bruce Evans wrote: On Tue, 18 Dec 2007, Mark Fullmer wrote: A little progress. I have a machine with a KTR enabled kernel running. Another machine is running David's ffs_vfsops.c's patch. I left two other machines (GENERIC kernels) running the packet loss tes

Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> > In any case, it appears that my patch is a no-op, at least for the > >problem I was trying to solve. This has me confused, however, because at > >one point the problem was mitigated with it. The patch has gone through > >several iterations, however, and it could be that it was made to the top

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Bruce Evans
On Wed, 19 Dec 2007, David G Lawrence wrote: Debugging shows that the problem is like I said. The loop really does take 125 ns per iteration. This time is actually not very much. The Considering that the CPU clock cycle time is on the order of 300ps, I would say 125ns to do a few checks i

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Stephan Uphoff
David G Lawrence wrote: Try it with "find / -type f >/dev/null" to duplicate the problem almost instantly. FreeBSD used to have some code that would cause vnodes with no cached pages to be recycled quickly (which would have made a simple find ineffective without reading the files at lea

Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
>In any case, it appears that my patch is a no-op, at least for the > problem I was trying to solve. This has me confused, however, because at > one point the problem was mitigated with it. The patch has gone through > several iterations, however, and it could be that it was made to the top > o

Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> Try it with "find / -type f >/dev/null" to duplicate the problem almost > instantly. FreeBSD used to have some code that would cause vnodes with no cached pages to be recycled quickly (which would have made a simple find ineffective without reading the files at least a little bit). I guess th

Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> On Tue, 18 Dec 2007, David G Lawrence wrote: > > >>>I got an almost identical delay (with 64000 vnodes). > >>> > >>>Now, 17ms isn't much. > >> > >> Says you. On modern systems, trying to run a pseudo real-time > >> application > >>on an otherwise quiescent system, 17ms is just short of an e

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Bruce Evans
On Tue, 18 Dec 2007, Mark Fullmer wrote: A little progress. I have a machine with a KTR enabled kernel running. Another machine is running David's ffs_vfsops.c's patch. I left two other machines (GENERIC kernels) running the packet loss test overnight. At ~ 32480 seconds of uptime the proble

Re: Packet loss every 30.999 seconds

2007-12-19 Thread Bruce Evans
On Tue, 18 Dec 2007, David G Lawrence wrote: I got an almost identical delay (with 64000 vnodes). Now, 17ms isn't much. Says you. On modern systems, trying to run a pseudo real-time application on an otherwise quiescent system, 17ms is just short of an eternity. I agree that the syncer sho

Re: Packet loss every 30.999 seconds

2007-12-18 Thread Mark Fullmer
A little progress. I have a machine with a KTR enabled kernel running. Another machine is running David's ffs_vfsops.c's patch. I left two other machines (GENERIC kernels) running the packet loss test overnight. At ~ 32480 seconds of uptime the problem starts. This is really close to a 16 b

Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> > I got an almost identical delay (with 64000 vnodes). > > > > Now, 17ms isn't much. > >Says you. On modern systems, trying to run a pseudo real-time application > on an otherwise quiescent system, 17ms is just short of an eternity. I agree > that the syncer should be preemptable (which is

Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> I got an almost identical delay (with 64000 vnodes). > > Now, 17ms isn't much. Says you. On modern systems, trying to run a pseudo real-time application on an otherwise quiescent system, 17ms is just short of an eternity. I agree that the syncer should be preemptable (which is what my bandai

Re: Packet loss every 30.999 seconds

2007-12-18 Thread Bruce Evans
On Tue, 18 Dec 2007, David G Lawrence wrote: I didn't say it caused any bogus disk I/O. My original problem (after a day or two of uptime) was an occasional large scheduling delay for a process that needed to process VoIP frames in real-time. It was happening every 31 seconds and was causing v

Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> On Tue, 18 Dec 2007, David G Lawrence wrote: > > >>Thanks. Have a kernel building now. It takes about a day of uptime > >>after reboot before I'll see the problem. > > > > You may also wish to try to get the problem to occur sooner after boot > >on a non-patched system by doing a "tar cf /dev

Re: Packet loss every 30.999 seconds

2007-12-18 Thread Bruce Evans
On Tue, 18 Dec 2007, David G Lawrence wrote: Thanks. Have a kernel building now. It takes about a day of uptime after reboot before I'll see the problem. You may also wish to try to get the problem to occur sooner after boot on a non-patched system by doing a "tar cf /dev/null /" (note: su

Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> Thanks. Have a kernel building now. It takes about a day of uptime > after reboot before I'll see the problem. You may also wish to try to get the problem to occur sooner after boot on a non-patched system by doing a "tar cf /dev/null /" (note: substitute /dev/zero instead of /dev/null, i

Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> >Right, it's a non-optimal loop when N is very large, and that's a fairly > >well understood problem. I think what DG was getting at, though, is > >that this massive flush happens every time the syncer runs, which > >doesn't seem correct. Sure, maybe you just rsynced 100,000 files 20 > >seconds

Re: Packet loss every 30.999 seconds

2007-12-18 Thread Bruce Evans
On Mon, 17 Dec 2007, Scott Long wrote: Bruce Evans wrote: On Mon, 17 Dec 2007, David G Lawrence wrote: One more comment on my last email... The patch that I included is not meant as a real fix - it is just a bandaid. The real problem appears to be that a very large number of vnodes (all of

Re: Packet loss every 30.999 seconds

2007-12-18 Thread Bruce Evans
On Mon, 17 Dec 2007, Mark Fullmer wrote: Thanks. Have a kernel building now. It takes about a day of uptime after reboot before I'll see the problem. Yes run "find / >/dev/null" to see the problem if it is the syncer one. At least the syscall latency problem does seem to be this. Under ~5.

Re: Packet loss every 30.999 seconds

2007-12-17 Thread Scott Long
Bruce Evans wrote: On Mon, 17 Dec 2007, David G Lawrence wrote: One more comment on my last email... The patch that I included is not meant as a real fix - it is just a bandaid. The real problem appears to be that a very large number of vnodes (all of them?) are getting synced (i.e. calling f

Re: Packet loss every 30.999 seconds

2007-12-17 Thread Bruce Evans
On Mon, 17 Dec 2007, David G Lawrence wrote: One more comment on my last email... The patch that I included is not meant as a real fix - it is just a bandaid. The real problem appears to be that a very large number of vnodes (all of them?) are getting synced (i.e. calling ffs_syncvnode()) ever

Re: Packet loss every 30.999 seconds

2007-12-17 Thread Bruce Evans
On Mon, 17 Dec 2007, David G Lawrence wrote: While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every

Re: Packet loss every 30.999 seconds

2007-12-17 Thread Mark Fullmer
Thanks. Have a kernel building now. It takes about a day of uptime after reboot before I'll see the problem. -- mark On Dec 17, 2007, at 5:24 AM, David G Lawrence wrote: While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled

Re: Packet loss every 30.999 seconds

2007-12-17 Thread Mark Fullmer
Back to back test with no ethernet switch between two em interfaces, same result. The receiving side has been up > 1 day and exhibits the problem. These are also two different servers. The small gettimeofday() syscall tester also shows the same ~30 second pattern of high latency between syscall

Re: Packet loss every 30.999 seconds

2007-12-17 Thread David G Lawrence
> While trying to diagnose a packet loss problem in a RELENG_6 snapshot > dated > November 8, 2007 it looks like I've stumbled across a broken driver or > kernel routine which stops interrupt processing long enough to severly > degrade network performance every 30.99 seconds. I noticed this a

Re: Packet loss every 30.999 seconds

2007-12-17 Thread David G Lawrence
One more comment on my last email... The patch that I included is not meant as a real fix - it is just a bandaid. The real problem appears to be that a very large number of vnodes (all of them?) are getting synced (i.e. calling ffs_syncvnode()) every time. This should normally only happen for di

Packet loss every 30.999 seconds

2007-12-16 Thread Mark Fullmer
While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as