Hello all,

Searched SA's website and google and scanned the past several weeks of emails to this list without luck. I hope someone can help me out.

A week or two ago, SA started randomly sucking up huge amounts of memory in one or more of the spamd children. I added the --max-conn-per-child=25 switch and noticed that the problem still happened, but would resolve itself evertually. Upon closer inspection, I found that spamd is is hanging on a message and gradually sucking up more and more memory (anywhere from 400-800MB). Eventually spamd finishes what it is doing and moves on, then it hits the 25 conn limit and the child is restarted and the memory released.

I have managed to catch 3 of the messages that it has hung on. All 3 happen to be spam, but I can't be sure that it hasn't happened on good mail since I only have 3 at the moment. 2 of the messages took just over 1000 seconds to scan and the 3rd, just over 600 seconds (two different servers). The spam report is generated like normal and everything continues as if nothing happened. The spam is tiny, basically just one of those "visit this link" emails. Viewing my logs, I see other very similar spams coming through around the same time without problems.

I found reference to a problem with corruption in the bayes db, but db_verify doesn't report any issues with my db. The 1000 and 600 second scantimes are leading me to some sort of weird timeout, but I'm not sure where.

I am running SA 3.02 and it is happening on both Suse9.1 and RedHat9 servers, using Exim/Exiscan. I have set lock_method to flock, turned on bayes_learn_to_journal, and the servers are running caching name servers. I am hoping to catch it again and get an strace on the problemed child. In the meantime, any suggestions would be greatly appreciated.

Thanks!

--
Dennis Skinner
Systems Administrator
BlueFrog Internet
http://www.bluefrog.com

Reply via email to