Mail server: 4.8-RELEASE-p3 A while back, on a couple of occasions, I posted a query about some bad behavior on my mail server. For the past several months, it has been either crashing/reboot or just rebooting. It's ALWAYS triggered by a SSH login, but at random and ONLY at the "su" to root -- usually the most time before reboot is about 2+ weeks and then contrasted by 2 in a row right after the reboot -- actually no pattern. It has never happened directly at the console.
I have replaced every single piece of hardware, e.g., PSU, cables, NICs, including finally a switching of the whole machine, except for the hard disk that contains the system. That had to remain in the new machine. Even then, I have moved the entire system & contents to another new HD. Thus, I concluded it to be a software problem. There are no indications of anything in the logs, and no core dumps. It just stops and reboots, and any random time it pick. Only a couple of times it has crashed without the remote login. One tip was that I might have stale NFS mountabs -- cleared them out, but problem persisted. The above tip was suggested when I mentioned that on a couple or more of the occurrences, I managed to get to the console quickly enough to see (in bright bold) "lockmgr locking against myself" -- or close to that. My google of that error does mention stale mounts, but mostly about esoteric code stuff. No fix found anywhere. Then, on this list, I saw the thread about other having mysterious reboots and one suggestion was to run lsof(8) on continuous loops so that a log file would be captured of open files when these reboots occurred. I have captured 6 of these logs. I don't see anything that jumps out as being a common file problem. I have placed 6 text files at the URLs below containing only 300 lines of each log, which should contain enough info for a comparison. (I let the logs grow to 200MB before restarting the lsof loop each time -- of course these samples are chopped off at the moment of crash/reboot along with the 300 other files before that moment) I am at a loss, other than rebuilding the system from scratch, but that is no assurance of a fix. The one thing unique here is that it is the mail server and runs spamd (spamassassin-2.55), spamass-milter-2.0 (which has a lock file) and procmail-3.22 (which does a lot of locking). I am suspicious of the locking going on with the above spam-fight programs, which may clash when a SSH login & su occurs. I believe a lock is required for it too...?? Would appreciate anyone's time and efforts to look at these files and see if anything is spotted that I don't see. the most recent is #6-lsof.txt and works backwards. The 6-lsof.txt was just this morning. http://sageweb/tmp/1-lsof.txt http://sageweb/tmp/2-lsof.txt http://sageweb/tmp/3-lsof.txt http://sageweb/tmp/4-lsof.txt http://sageweb/tmp/5-lsof.txt http://sageweb/tmp/6-lsof.txt Much obliged! Best regards, Jack L. Stone, Administrator SageOne Net http://www.sage-one.net [EMAIL PROTECTED] _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"