Hello, Using: spamassassin 3.2.5 on a CentOS 5.2 system.
Unfortunately the spamd process on one of our mail servers crashed early this morning. The system mail log showed: ========================================================== Jan 31 06:52:00 tracy spamd[23255]: spamd: connection from localhost.localdomain [127.0.0.1] at port 45028 Jan 31 06:52:13 tracy spamd[2347]: spamd: server killed by SIGTERM, shutting down Jan 31 06:52:24 tracy spamd[26043]: server socket setup failed, retry 1: spamd: could not create INET socket on 127.0.0.1:783: Address already in use Jan 31 06:52:25 tracy spamd[23255]: spamd: checking message <200901310651.n0v6pxad026...@isg-prod-loader.informa.com> for sauser:10001 Jan 31 06:52:25 tracy spamd[26043]: server socket setup failed, retry 2: spamd: could not create INET socket on 127.0.0.1:783: Address already in use Jan 31 06:52:26 tracy spamd[26043]: spamd: could not create INET socket on 127.0.0.1:783: Address already in use Jan 31 06:52:31 tracy spamd[23255]: spamd: clean message (-6.6/8.0) for sauser:10001in 30.9 seconds, 5194 bytes. Jan 31 06:52:31 tracy spamd[23255]: spamd: result: . -6 - BAYES_00,RCVD_IN_DNSWL_MEDscantime=30.9,size=5194,user=sauser,uid=10001,required_score=8.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=45028,mid=<200901310651.n0v6pxad026...@isg-prod-loader.informa.com>,bayes=0.000000,autolearn=ham Jan 31 06:52:31 tracy spamd[23255]: syswrite() to parent failed: Broken pipe at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 576. ========================================================== My first thought was a bug in the SpamdForkScaling.pm module, but I'm not so sure. At 06:52 spamd was fine, but we have an sa-update/sa-compile job that runs at around that time. The files in /var/lib/spamassassin/compiled indicate that the job was running (or finishing) at 06:52. The job (if successful) then restarts spamassassin (using 'service spamassassin restart'). Now, the above log shows that at 06:52:13 SA received a shutdown signal - which is correct when restarting. But at 06:52:24 it seems to be trying to startup but cannot because SA is still running (the port is in use). Then at 06:52:31 it seems that some SA scan now finishes, and because SA was trying to restart, the parent process was gone and, hence, the syswrite error. Okay, so looking at the SA startup script it shows (this is within a shell 'case' statement): ========================================================== stop) # Stop daemons. echo -n $"Stopping $prog: " killproc spamd RETVAL=$? echo if [ $RETVAL = 0 ]; then rm -f /var/lock/subsys/spamassassin rm -f $SPAMD_PID fi ;; restart) $0 stop sleep 3 $0 start ;; ========================================================== I suspect the problem is that the 'stop' actually failed (RETVAL != 0). But since the 'restart' doesn't check this, it then just went on and tried to 'start' SA. This failed because SA still had a process/child running. Ultimately it meant that our mail server ended up with SA not running. Perhaps the RedHat (and hence Fedora (I assume)/CentOS) startup script should be a bit more aggressive in its checking that SA has actually stopped before trying to start it again? I think I would rather that more time was spent on ensuring that SA was stopped, so that it could then start, rather than it completely failing and the server being left without SA running. John. -- --------------------------------------------------------------- John Horne, University of Plymouth, UK Tel: +44 (0)1752 587287 E-mail: john.ho...@plymouth.ac.uk Fax: +44 (0)1752 587001