Hi Brian, It was a good thought, but we can't put the blame on bad hardware.
These tests were done on the RELENG_3 system cvsup'd as of Feb 22 @ 20:00 EST. All tests were run internal to the same machine. So that I don't remain the only guy in the world to see these test results, Control files are included so you can test locally:-) "ppp0 -direct" on localhost is started by port 6671. I know (now) that setting up the test this way the ppp's were communicating via localhost rather than the tunnel, but this way was much cleaner as far as verifying exactly how close the results were to what I saw running the server under 2.2-stable. There were differences, but the main issues are demonstrated. You will recall our discussion about the server hanging around under 2.2-stable after the client is terminated? Required by the RFCs you said? Under RELENG_3 the server meekly goes away, which makes sense to me. Two tests were done. The first involved "kill -KILL clientpid". The second was "kill -TERM clientpid". In the first test, the server illegally removed the default route. In the second test, the server did the same - neither ppp actioned the second command in the linkdown scripts. I was surprised that the first test ended immediately - I thought the LQR packets would cause the server to terminate after 1 minute. Files: test1.netstat0 shows routing after boot test1.netstat1 shows routing after "ppp -background testloop" test1.psaxl show ps results for the executing processes. test1.netstat2 shows routing after killing the client. test1.tun0 ifconfig while active. test1.tun1 ifconfig while active. test2.netstat routing tables after terminating the client. Logs are supplied for both tests. I hope that this is very helpful to you. I really appreciate your efforts!! Cheers, Tom > Hi, > > I don't claim to know a great deal about cache code etc, but I'm > pretty sure that it's extremely unlikely that the file name has any > chance of affecting the buffer cache. While NFS has its fair > share of problems (with which Matt is dealing with admirably), I > would think that the code that does the work there is equally unlikely > to know anything about file names. > > Having said all that in as vague a way as possible, the reason I'm > posting this is that you seem to be experiencing difficulties with > ppp that are of a similar nature - that is, completely inexplicable > and unseen by anyone else - disappearing default routes, ppp.linkdown > not being processed, > > I'm beginning to suspect a hardware problem - perhaps with your disk > controller or something. This wouldn't easily explain the default > route problem, but may explain the failure to process ppp.linkdown.... > > Maybe you could try treating the other machine (your son's machine?) > as the gateway, and see if things become more stable. If they do, > the finger might be pointed more firmly at hardware. > > > On the weekend I reported to hackers about problems experienced with > > 2.2-stable and RELENG-3 systems where I experienced files that > > disappeared from cache and Mail directories that disappeared. > > The RELENG-3 system had files affected with softupdates enabled. > > The 2.2-stable system had sub-directories missing from the > > same directories that I was writing to via nfsv2. > > > > By coincidence, I had cvsup'd and compiled new kernels and naturally > > made the assumption that there was causality there. Subsequently > > I have come to believe that the problem may have more to do with what > > I was doing, not changes to the code. > > > > For about 3-4 hours prior to noticing the problems, I had been > > repetitively editing dot files, then writing a kludge of dot files > > to the local system hard drive and to the nfs exported FS of the > > other computer, while occasionally checking mail on that computer. > > > > All files and directories missing were being updated for > > one reason or another by myself or by mail processes while > > I was doing this. > > > > It is speculation, but there is a good chance that there is a bug > > in the cache-handling code that causes problems with other files > > or directories being dropped from cache because of bad processing > > common to BOTH or ALL releases, when large numbers of dot files are > > being written. The dot files themselves did not disappear - other > > items to be written disappeared before their writes actually > > occurred. > > > > I know that this is a frustrating kind of message to receive, but > > I am not a developer & not qualified to go into the code myself. > > Also no logs or hard output are available - files/directories > > simply disappeared without any error messages. > > > > I just did a scan of the entire /usr/src/sys tree for \"\\.\" > > and \'\\.\' to see what code sections might be affected - mostly > > cache-handling. In quantity, not bad, really. > > > > Others have apparently reported missing files to do with nfs > > I believe. THis might or might not be a related problem. > > > > I guess that I am asking someone who is qualified, and concerned > > about missing files or directories, if they would be willing > > to do what I cannot - check the code for bad interactions when > > dot files are being written- bearing in mind that it is OTHER > > files/directories that are disappearing from cache before being > > written. > > > > Is anyone out there sufficiently intrigued by the possibility > > to invest some valuable time? > > > > I am a QA tester, not a developer, and therefore much more > > comfortable with discussion of symptoms and speculative > > causality than most developers I have known. I hope that > > someone thinks enough of the possibility to invest some > > time, which I know is in very short supply. I cannot deny > > that this is (informed) speculation - there are no guarantees. > > > > Regards and best wishes, > > Tom > > -- > Brian <br...@awfulhak.org> <br...@freebsd.org> <br...@openbsd.org> > <http://www.Awfulhak.org> > Don't _EVER_ lose your sense of humour ! > > >
testppp.tgz
Description: Test1 and test2 results