No response on -current and I have an update. After moving to -current source via cvs -D (1999//D99.08.28.21.00.00) the servers are infinitely more stable, running continuously for three days, and showing no signs of dying. I get sockname errors once in a great while, and only one at a time. I get the VM error mentioned below once in a great while, but I'm fairly certain I've tracked that down to a problem with the miva processing engine binary itself. I haven't had any of the errant apache children not dying errors, and only one of the calcru errors. I have come up with a theory on this and I'd appreciate if someone could comment on it. We get pre-compiled binaries from the Miva Corp. people that I'm 99.99% sure are built on a 2.2.x or 3.x machine (waiting on confirmation now). So what I'm thinking (based mostly on the sockname error) is that there is a sort of "library creep" happening where small incompatibilities between the version of the library that the binary is expecting and the version it's finding are just a bit out of synch. I am wondering if adding the appropriate compat libraries to these systems would help, and if so how would I specify that this specific binary use those libraries as opposed to the ones in /usr/lib? Any insights on this would be greatly appreciated. Here are some details on the binaries, let know if anything else is needed. Thanks, Doug ldd miva miva: libcrypt.so.2 => /usr/lib/libcrypt.so.2 (0x280d3000) libc.so.3 => /usr/lib/libc.so.3 (0x280e9000) libm.so.2 => /usr/lib/libm.so.2 (0x2816c000) file miva miva: setuid sticky ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked, stripped -- "Stop it, I'm gettin' misty." - Mel Gibson as Porter, "Payback" ---------- Forwarded message ---------- Date: Wed, 29 Sep 1999 12:28:59 -0700 (PDT) From: Doug <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Weird sockname errors with -current and apache Greetings, I'm using -current on some web server/CGI processing machines. Yes I know all about using -current on production stuff, but we need the NFS, et al fixes due to the heavy NFS client activity on these systems, and I'm willing to take the good with the bad. I cvsup'ed and built world and kernel on or about 8/26 and these boxes ran fine for about 26 days. On 9/22 (Wednesday) I cvsup'ed and built world and kernel on one machine in order to take advantage of Matt's latest round of NFS, etc. fixes. That box ran well for two days so I updated the rest of them on Friday (9/24) and took off for a happy weekend. Well, you know what happened, one box locked up on saturday, I came in and rebooted it, then the other 4 boxes locked up on sunday. *sigh* The really annoying thing here is that there isn't ONE clear problem that I can point to. Also, when the boxes die they wedge solid. No console, serial or otherwise, and no DDB so I can't find out exactly what they are doing when they die. I have the DDB_UNATTENDED option in the kernel because I have the boxes set up to recover themselves on boot and go back into service (previous to the 26 day uptime panics were common). I'm starting to think I should disable that, however as far as I can see they aren't panic'ing, they are just freezing up; although they are ping'able. We started out this project with Apache 1.3.6, and on Sept. 7 we moved to 1.3.9. These are dual PIII 500 machines with a half gig of ram each. The other annoying thing is that while I was checking the kernel, etc. logs for signs of problems, it hadn't occured to me to check the apache error log. Once I did I noticed that at least some of the symptoms I'm seeing go back as far as I have logs, even before the blessed 26 day uptime period. Here is what I've seen. The first errror I can find in any of the logs I have that seems related to the problem is this from apache's error log: [Fri Aug 20 10:59:34 1999] [error] (22)Invalid argument: getsockname consequently I've noticed that we get this error a LOT, usually coinciding with a period of time where the machine is wedged, after which it sometimes comes back, and sometimes doesn't (i.e., it stays wedged). When this happens it usually repeats about 15-20 times, followed by: Virtual memory exceeded in `new' then a NULL character (^@) in the apache log. Those errors are usually accompanied by a slew of "Premature end of script headers" messages, apparently related to CGI process that these web servers run dying off before it finishes writing out its data. We also have a slew of these errors in the apache logs at various times (doesn't *seem* to be a correlation with the others, but I'm not sure) that look like: [Mon Sep 13 12:51:03 1999] [warn] child process 82600 still did not exit, sending a SIGTERM [Mon Sep 13 12:51:03 1999] [warn] child process 83437 still did not exit, sending a SIGTERM [Mon Sep 13 12:51:03 1999] [warn] child process 84136 still did not exit, sending a SIGTERM [Mon Sep 13 12:51:03 1999] [warn] child process 83698 still did not exit, sending a SIGTERM [Mon Sep 13 12:51:03 1999] [warn] child process 83703 still did not exit, sending a SIGTERM Sometimes these happen at the same time, sometimes they don't. When this one happens we get about 40 of them in a row. In the system logs the only unusual thing I've seen (and I enable a LOT of logging) are these messages, which started over this past weekend. /kernel: calcru: negative time of 4347162 usec for pid 6806 (httpd) Once again, when these come they come in bunches, sometimes with a positive time value like this one, sometimes with a negative one. I'm used to seeing calcru messages related to the kernel misjudging the speed of the processor, but the recently added code that tells you the speed on SMP systems says that I have CPU: Pentium III (498.75-MHz 686-class CPU), which looks right to me. Now, as if the above were not annoying enough, all of these problems could very well be related to the third party CGI processing engine (a program called Miva) which we have tracked down some bugs in before. Of course the machines freezing up is my main concern at this point, but the errors themselves could be coming from miva. Any suggestions on how to debug this problem further would be greatly appreciated. I'm going to start up some boxes today that don't have the DDB_UNATTENDED option enabled to see if they will in fact panic and drop to the debugger. Beyond that, I'm at a bit of a loss here. TIA, Doug To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message