Me again. Since I'm not getting any responses I better keep posting more information as I've made some more investigating today.

Sometimes when I run sa-learn --force-expire I get this response almost immediately:
Bus error (core dumped)
When I run again the process just hogs until I break it after about 15 minutes.

I have also changed bayes_learn_to_journal back to 0 and lock_method to flock.

Now I get these in spamd.log:
Mon Sep 25 17:05:18 2006 [8853] warn: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed: Interrupted system call

I also lowered --max-children from 8 to 6 with this result:
Mon Sep 25 17:11:03 2006 [6702] info: prefork: server reached --max-children setting, consider raising it

Here's some top output of a typical situation:
 PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
8287 spamd    132    0 48056K 44220K RUN      8:00 88.43% 88.43% perl5.8.7
8853 spamd     20    0 40416K 38356K lockf    0:11  1.32%  1.32% perl5.8.7
9128 spamd     20    0 38592K 36544K lockf    0:03  0.63%  0.63% perl5.8.7
8879 spamd     20    0 40804K 38484K lockf    0:08  0.59%  0.59% perl5.8.7
9103 spamd     20    0 39728K 37736K lockf    0:04  0.54%  0.54% perl5.8.7

-rw-------  1 spamd  wheel        45 Sep 25 17:04 bayes.mutex
-rw-------  1 spamd  wheel    240024 Sep 25 17:15 bayes_journal
-rw-------  1 spamd  wheel   1039920 Sep 25 17:04 bayes_journal.old
-rw-r--r--  1 spamd  wheel  83787776 Sep 25 16:09 bayes_seen
-rw-------  1 spamd  wheel  85901312 Sep 25 17:04 bayes_toks

# cat bayes.mutex
8287
6708
6708
6708
6708
6708
6708
6708
6708


What is wrong?! What is making spamd go *kaboom* several times an hour?
Is it something with expiring tokens that's not working correctly?
Is it normal to have an bayes_journal.old laying around?
What more can I do to find the cause?

If the core dump (22 MB) is of any interrest, I'll upload it somewhere.



Best regards,
Andreas





Andreas Pettersson wrote:

Ok, more information here.

I found in spamd.log this line when the problem started:
Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens: child processing timeout at /usr/local/bin/spamd line 1082

which was followed by lots of these:
Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayes databases /usr/local/share/spamassassin/bayes/bayes_* R/W:
lock failed: File exists

In an attempt to find what's wrong I changed bayes_learn_to_journal to 1. It didn't help, but at least I got rid of the 'lock failed: File exist' error messages in spamd.log and bayes also keeps working. For the moment I have a script that checks for bayes.lock existance and kills the hogging process and removes the lock file. It runs every minute..


I have tried change lock_method to flock, problem still there (but with a new lock file name). I also tried a sa-learn --force-expire. It took about 30 sec to complete. It didn't solve my problem either.


Any ideas of what might be wrong?

Regards,
Andreas



Reply via email to