On Thu, 2005-09-22 at 17:16, Todd Lyons wrote:
> R. Steven Rainwater wanted us to know:
> >> Have you tried running clamd and using --external on clamav-milter?
> >Just tried it. Already had two more crashes in less than 10 minutes!
> >:-(  Both were of the "write(A) return -1, expected 5: Broken pipe"
> >variety, if that means anything.
> 
> Pick up the max children setting.  See if that makes a difference.
> Watch as the number of processes build up.

Thanks Todd, this was the first thing I've tried that helped. Prior to
0.87, we were running max children at 25 and never had problems. I
bumped it up to 40 now and that seems to have helped somewhat. We're
still getting a couple of the error messages in the log every hour but
it seems that overall, clamav-milter is now able to continue longer
before sendmail starts rejecting everything. I adjusted my cron job to
restart clamav-milter once a day instead of once an hour, so things are
relatively stable compared to yesterday.

I still find it odd that 0.87 seems so broken. All the previous versions
we've run on this machine have been very stable. And I take it this is
not a problem anyone else is seeing, making it even more of a mystery.

Taking a clue from the max children setting, I started monitoring the
processes and I now have a hunch about what's happening. I think that
certain emails are crashing clamav-milter or clamd when it reads them,
causing the processes to hang. I've noticed that each time we get one of
the errors in the log, an additional clamav-milter process gets "stuck",
so that over time, we collect more and more stuck processes until max
children is hit and everything blows up.

We occasional get emails that take an hour or so to receive and process.
Prior to 0.87, you'd see a sendmail process along with the associated
clamav-milter and spamass-milter processes hanging out until it
finished. What happens with 0.87 is that the sendmail and spamassasin
process go ahead and end at the time of the clamav-milter crash but
sometimes the clamav processes seem to stick around forever (until I
restart clamav-milter). I've got clamav-milter processes that have been
running for over 7 hours even though the associated sendmail process is
long gone. 

But I guess the big question now is how can determine for sure if it's a
specifically formatted email that's causing the clamav crashes and, if
so, how can I capture one of the emails? 

> Also check dmesg to see if it's reporting weird things such as NMI
> errors (ie bad memory);.

I checked this and nothing unusual is being reported.

-Steve

_______________________________________________
http://lurker.clamav.net/list/clamav-users.html

Reply via email to