On Thu, 2005-09-22 at 17:16, Todd Lyons wrote: > R. Steven Rainwater wanted us to know: > >> Have you tried running clamd and using --external on clamav-milter? > >Just tried it. Already had two more crashes in less than 10 minutes! > >:-( Both were of the "write(A) return -1, expected 5: Broken pipe" > >variety, if that means anything. > > Pick up the max children setting. See if that makes a difference. > Watch as the number of processes build up.
Thanks Todd, this was the first thing I've tried that helped. Prior to 0.87, we were running max children at 25 and never had problems. I bumped it up to 40 now and that seems to have helped somewhat. We're still getting a couple of the error messages in the log every hour but it seems that overall, clamav-milter is now able to continue longer before sendmail starts rejecting everything. I adjusted my cron job to restart clamav-milter once a day instead of once an hour, so things are relatively stable compared to yesterday. I still find it odd that 0.87 seems so broken. All the previous versions we've run on this machine have been very stable. And I take it this is not a problem anyone else is seeing, making it even more of a mystery. Taking a clue from the max children setting, I started monitoring the processes and I now have a hunch about what's happening. I think that certain emails are crashing clamav-milter or clamd when it reads them, causing the processes to hang. I've noticed that each time we get one of the errors in the log, an additional clamav-milter process gets "stuck", so that over time, we collect more and more stuck processes until max children is hit and everything blows up. We occasional get emails that take an hour or so to receive and process. Prior to 0.87, you'd see a sendmail process along with the associated clamav-milter and spamass-milter processes hanging out until it finished. What happens with 0.87 is that the sendmail and spamassasin process go ahead and end at the time of the clamav-milter crash but sometimes the clamav processes seem to stick around forever (until I restart clamav-milter). I've got clamav-milter processes that have been running for over 7 hours even though the associated sendmail process is long gone. But I guess the big question now is how can determine for sure if it's a specifically formatted email that's causing the clamav crashes and, if so, how can I capture one of the emails? > Also check dmesg to see if it's reporting weird things such as NMI > errors (ie bad memory);. I checked this and nothing unusual is being reported. -Steve _______________________________________________ http://lurker.clamav.net/list/clamav-users.html