Hi Brian,

It was a good thought, but we can't put the blame on bad hardware.

These tests were done on the RELENG_3 system cvsup'd 
as of Feb 22 @ 20:00 EST. All tests were run internal to the
same machine. So that I don't remain the only guy in the world
to see these test results, Control files are included so you 
can test locally:-)

"ppp0 -direct" on localhost is started by port 6671.

I know (now) that setting up the test this way the ppp's were
communicating via localhost rather than the tunnel, but this way
was much cleaner as far as verifying exactly how close the results
were to what I saw running the server under 2.2-stable. There were
differences, but the main issues are demonstrated.

You will recall our discussion about the server hanging around
under 2.2-stable after the client is terminated? Required by the
RFCs you said? Under RELENG_3 the server meekly goes away, which
makes sense to me.

Two tests were done. The first involved "kill -KILL clientpid".
The second was "kill -TERM clientpid".
In the first test, the server illegally removed the default route.
In the second test, the server did the same - neither ppp actioned
the second command in the linkdown scripts.

I was surprised that the first test ended immediately - I thought
the LQR packets would cause the server to terminate after 1 minute.

Files:
test1.netstat0  shows routing after boot
test1.netstat1  shows routing after "ppp -background testloop"
test1.psaxl     show ps results for the executing processes.
test1.netstat2  shows routing after killing the client.
test1.tun0      ifconfig while active.
test1.tun1      ifconfig while active.
test2.netstat   routing tables after terminating the client.
Logs are supplied for both tests.

I hope that this is very helpful to you. I really appreciate
your efforts!!

Cheers,
Tom

> Hi,
> 
> I don't claim to know a great deal about cache code etc, but I'm 
> pretty sure that it's extremely unlikely that the file name has any 
> chance of affecting the buffer cache.  While NFS has its fair 
> share of problems (with which Matt is dealing with admirably), I 
> would think that the code that does the work there is equally unlikely 
> to know anything about file names.
> 
> Having said all that in as vague a way as possible, the reason I'm 
> posting this is that you seem to be experiencing difficulties with 
> ppp that are of a similar nature - that is, completely inexplicable 
> and unseen by anyone else - disappearing default routes, ppp.linkdown 
> not being processed,
> 
> I'm beginning to suspect a hardware problem - perhaps with your disk 
> controller or something.  This wouldn't easily explain the default 
> route problem, but may explain the failure to process ppp.linkdown....
> 
> Maybe you could try treating the other machine (your son's machine?) 
> as the gateway, and see if things become more stable.  If they do, 
> the finger might be pointed more firmly at hardware.
> 
> > On the weekend I reported to hackers about problems experienced with
> > 2.2-stable and RELENG-3 systems where I experienced files that
> > disappeared from cache and Mail directories that disappeared.
> > The RELENG-3 system had files affected with softupdates enabled.
> > The 2.2-stable system had sub-directories missing from the
> > same directories that I was writing to via nfsv2.
> > 
> > By coincidence, I had cvsup'd and compiled new kernels and naturally
> > made the assumption that there was causality there. Subsequently
> > I have come to believe that the problem may have more to do with what 
> > I was doing, not changes to the code.
> > 
> > For about 3-4 hours prior to noticing the problems, I had been 
> > repetitively editing dot files, then writing a kludge of dot files
> > to the local system hard drive and to the nfs exported FS of the
> > other computer, while occasionally checking mail on that computer. 
> > 
> > All files and directories missing were being updated for
> > one reason or another by myself or by mail processes while
> > I was doing this.
> > 
> > It is speculation, but there is a good chance that there is a bug
> > in the cache-handling code that causes problems with other files
> > or directories being dropped from cache because of bad processing
> > common to BOTH or ALL releases, when large numbers of dot files are
> > being written. The dot files themselves did not disappear - other
> > items to be written disappeared before their writes actually 
> > occurred. 
> > 
> > I know that this is a frustrating kind of message to receive, but
> > I am not a developer & not qualified to go into the code myself.
> > Also no logs or hard output are available - files/directories
> > simply disappeared without any error messages.
> > 
> > I just did a scan of the entire /usr/src/sys tree for \"\\.\"
> > and \'\\.\' to see what code sections might be affected - mostly
> > cache-handling. In quantity, not bad, really.
> > 
> > Others have apparently reported missing files to do with nfs
> > I believe. THis might or might not be a related problem.
> > 
> > I guess that I am asking someone who is qualified, and concerned
> > about missing files or directories, if they would be willing
> > to do what I cannot - check the code for bad interactions when
> > dot files are being written- bearing in mind that it is OTHER
> > files/directories that are disappearing from cache before being
> > written.
> > 
> > Is anyone out there sufficiently intrigued by the possibility
> > to invest some valuable time?
> > 
> > I am a QA tester, not a developer, and therefore much more
> > comfortable with discussion of symptoms and speculative
> > causality than most developers I have known. I hope that
> > someone thinks enough of the possibility to invest some 
> > time, which I know is in very short supply. I cannot deny
> > that this is (informed) speculation - there are no guarantees.
> > 
> > Regards and best wishes,
> > Tom
> 
> -- 
> Brian <br...@awfulhak.org> <br...@freebsd.org> <br...@openbsd.org>
>       <http://www.Awfulhak.org>
> Don't _EVER_ lose your sense of humour !
> 
> 
> 

Attachment: testppp.tgz
Description: Test1 and test2 results

Reply via email to