I've been running SA on Debian stable for a few years now without any real issues, upgrading SA and related items as needed from time to time origionally through debian, later from source, and last night (in a vain attempt to fix the problem I'm about to describe) through backports.org.

A few days ago (and no recent changes at all I'm aware of other than a few standard package upgrades through the debian package system that may or may not have upgraded a few Perl modules) all of the spam started pouring in. A closer investigation showed no markup on the e-mails, so SA wasn't checking, and this message from the logs:

sm-mta[23911]: j45GLfLE023892: timeout waiting for input from local dur
ing Draining Input

I restarted SA and everything cleared. I shrugged and went about my business. 12-24 hours later it happened again. I quit shrugging and started digging. I noted that telnetting straight into SA would yeild a connection, but no response from SA. I turned on debugging, restarted and waited. 12-24 hours later it died again. Checked the logs. Nothing. It logs just like everything is normal and then at some point it stops logging - and it stops right after it finishes a transaction, so it's not like it's stopping in the middle or something. Also, it seems to stop a bit (a few hours I think) before I start getting the Draining Input messages (BTW - I've got sendmail handing to procmail which is calling SA). Closer inspection of the running processes shows that while my main SA process is sitting there fine:

/usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -D -d --pidfile=/var/run/spamd.pid

There are no children in sight. (I think maybe once during one failure I actually saw 1 child. Most of the time I see none though.) There are however, plenty of spamc clients sitting around waiting on service.

I upgraded SA last night to the latest version (the one I was running wasn't that old, but it was still back in the 2 series) to:

SpamAssassin version 3.0.2
  running on Perl version 5.6.1

Well, today it's still pulling this stunt on me. given that an upgrade didn't fix it, and I don't see a lot of other people talking about this issue, I suspect it has to do with the perl libraries and threading. Threading in perl isn't something I know a lot about though. For kicks and grins I decided to see if it could respawn children at all and locked --max-conn-per-child down to 1. It definately respawns without a problem. The log clearly shows:

spamd[30798]: server hit by SIGCHLD
spamd[30798]: handled cleanup of child pid 5442
spamd[30798]: server successfully spawned child process, pid 5525

Well I let that run for a few minutes and it wouldn't die, so for performance I cranked --max-conn-per-child up to 25 and stopped/started (I get a little paranoid about HUP and restart sometimes) SA. About an 2-3 hours later (much shorter this time) it died again. the process list looked a bit different this go around though:

5586 ? S 1:07 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir --max-conn-per-child 25 -d --pidfile=/var/run/spamd.pid
8481 ? S 0:00 spamc
10090 ? S 0:00 spamc
10117 ? S 0:00 spamc
10276 ? S 0:00 spamc
10703 ? S 0:00 spamc
10787 ? S 0:00 spamc
11266 ? S 0:00 spamc
11270 ? S 0:00 spamc
11681 ? S 0:00 spamc
11967 ? S 0:00 spamc
11978 ? S 0:00 spamc
11995 ? S 0:00 spamc
12203 ? S 0:00 spamc
12305 ? S 0:00 spamc
12328 ? S 0:00 spamc
12494 ? S 0:00 spamc
12740 ? S 0:00 spamc
12745 ? S 0:00 spamc
13226 ? S 0:00 spamc
13236 ? S 0:00 spamc
13579 ? S 0:00 spamc
13773 ? S 0:00 spamc
14818 ? S 0:00 spamc
15222 ? S 0:00 spamc
15290 ? S 0:00 spamc
15322 ? S 0:00 spamc
16526 ? S 0:00 spamc
17537 ? S 0:00 spamc
17680 ? S 0:00 spamc
18067 ? S 0:00 spamc
18563 ? S 0:00 spamc
18717 ? S 0:00 spamc
18942 ? S 0:00 spamc
19079 ? S 0:00 spamc
19819 ? S 0:00 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir --max-conn-per-child 25 -d --pidfile=/var/run/spamd.pid


Notice the two different master processes - /usr/sbin/spamd (or that's what I assume they are anyway). For normal operation, things look like this:

20088 ? S 0:00 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir --max-conn-per-child 25 -d --pidfile=/var/run/spamd.pid
20097 ? S 0:03 spamd child
20098 ? S 0:03 spamd child
20099 ? S 0:03 spamd child
20100 ? S 0:02 spamd child
20101 ? S 0:03 spamd child



Not sure why there were two master processes there in that crash. Both were using the same pid file it appeared. Maybe just an anonmoly not related (I always try to be aware of the fact that any problem I troubleshoot could actually be multiple unrelated problems occuring at the same time).



Anyway, I'm not real sure where to go with this other than to hit CPAN and start upgrading perl libraries that look like they might be involved (a very unDebian like thing to do). Anyone have experience with this or can point to the possible problem?


Stats:

Linux version 2.4.20 (gcc version 2.95.4 20011002 (Debian prerelease)) #11 Mon Dec 1 18:39:20 EST 2003
Debian stable
backport.org SA (SpamAssassin version 3.0.2)
Perl 5.6.1
procmail v3.22 2001/09/10
Sendmail 8.12.3+3.5Wbeta/8.12.3/Debian-7.1
POSIX.pm version 1.03
System uptime: Around 520 days
Intel(R) Pentium(R) 4 CPU 2.40GHz
512MB RAM
Dell PowerEdge Server
More than enough free disk space on all partitions.
Load average of around 0.5
Around 106 processes
2GB of swap space
Around 100MB of free memory


There's probably something someone wants than I've not supplied here. Tell me what it is and I'll dig it out.

Any help would be appreciated.

Thanks!

Gene






Reply via email to