Me again. Since I'm not getting any responses I better keep posting more
information as I've made some more investigating today.
Sometimes when I run sa-learn --force-expire I get this response almost
immediately:
Bus error (core dumped)
When I run again the process just hogs until I break it after about 15
minutes.
I have also changed bayes_learn_to_journal back to 0 and lock_method to
flock.
Now I get these in spamd.log:
Mon Sep 25 17:05:18 2006 [8853] warn: bayes: cannot open bayes databases
/usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed:
Interrupted system call
I also lowered --max-children from 8 to 6 with this result:
Mon Sep 25 17:11:03 2006 [6702] info: prefork: server reached
--max-children setting, consider raising it
Here's some top output of a typical situation:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
8287 spamd 132 0 48056K 44220K RUN 8:00 88.43% 88.43% perl5.8.7
8853 spamd 20 0 40416K 38356K lockf 0:11 1.32% 1.32% perl5.8.7
9128 spamd 20 0 38592K 36544K lockf 0:03 0.63% 0.63% perl5.8.7
8879 spamd 20 0 40804K 38484K lockf 0:08 0.59% 0.59% perl5.8.7
9103 spamd 20 0 39728K 37736K lockf 0:04 0.54% 0.54% perl5.8.7
-rw------- 1 spamd wheel 45 Sep 25 17:04 bayes.mutex
-rw------- 1 spamd wheel 240024 Sep 25 17:15 bayes_journal
-rw------- 1 spamd wheel 1039920 Sep 25 17:04 bayes_journal.old
-rw-r--r-- 1 spamd wheel 83787776 Sep 25 16:09 bayes_seen
-rw------- 1 spamd wheel 85901312 Sep 25 17:04 bayes_toks
# cat bayes.mutex
8287
6708
6708
6708
6708
6708
6708
6708
6708
What is wrong?! What is making spamd go *kaboom* several times an hour?
Is it something with expiring tokens that's not working correctly?
Is it normal to have an bayes_journal.old laying around?
What more can I do to find the cause?
If the core dump (22 MB) is of any interrest, I'll upload it somewhere.
Best regards,
Andreas
Andreas Pettersson wrote:
Ok, more information here.
I found in spamd.log this line when the problem started:
Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: child
processing timeout at /usr/local/bin/spamd line 1082
which was followed by lots of these:
Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes
databases /usr/local/share/spamassassin/bayes/bayes_* R/W:
lock failed: File exists
In an attempt to find what's wrong I changed bayes_learn_to_journal to
1. It didn't help, but at least I got rid of the 'lock failed: File
exist' error messages in spamd.log and bayes also keeps working. For
the moment I have a script that checks for bayes.lock existance and
kills the hogging process and removes the lock file. It runs every
minute..
I have tried change lock_method to flock, problem still there (but
with a new lock file name).
I also tried a sa-learn --force-expire. It took about 30 sec to
complete. It didn't solve my problem either.
Any ideas of what might be wrong?
Regards,
Andreas