Michael Dankov wrote:
> Do you still have timeouts reported by clamd? Be sure you enable clamd > logging either by LogFile or LogSyslog in clamav.conf. I have "LogSyslog" set in clamav.conf, and the last time it hiccuped was this morning, starting at about 7:20. I'm not seeing explicit timeout errors at the moment. > Do those log entries appear some time (up to 20min or even more) after > clamd session times out, exactly one entry per timeout? > > Is this all happens with --quarantine-dir enabled? I mean > --quarantine-dir, not --quarantine because these two optons do nearly the > same for end user but in completely different ways. It all is happening with --quarantine-dir enabled. A brief chronology of this mornings problem appears to be:- Dec 12 07:16:35 castor clamav-milter[8758]: clean message from <[EMAIL PROTECTED]> Dec 12 07:18:08 castor clamav-milter[8880]: clamfi_connect: connection from castor.acu.ac.uk [194.81.120.81] Dec 12 07:20:23 castor clamd[12601]: SelfCheck: Database status OK. Dec 12 07:20:48 castor clamav-milter[9396]: clamfi_connect: connection from web14421.mail.yahoo.com [216.136.174.201] Dec 12 07:20:49 castor clamav-milter[9396]: clean message from <[EMAIL PROTECTED]> All is fine at this point. At 7:20am, clamd runs a self-check. I am now almost *certain* that the error condition is triggered by this self-check, as I've never consciously seen a failure without an attached self-check. Shortly after the self-check, it becomes apparent that things are going wrong:- Dec 12 07:29:34 castor clamav-milter[10242]: clamfi_connect: connection from 217-162-78-18.dclient.hispeed.ch [217.162.78.18] Dec 12 07:29:34 castor clamav-milter[10245]: clamfi_connect: connection from 217-162-78-18.dclient.hispeed.ch [217.162.78.18] Dec 12 07:29:35 castor clamav-milter[10248]: clamfi_connect: connection from [210.205.248.159] [210.205.248.159] Dec 12 07:29:35 castor clamav-milter[10253]: clamfi_connect: connection from [210.205.248.159] [210.205.248.159] Dec 12 07:29:35 castor clamav-milter[10255]: clamfi_connect: connection from ga-cmng-cuda2-c4d-13.atlaga.adelphia.net [68.170.180.13] Dec 12 07:29:36 castor clamav-milter[10258]: clamfi_connect: connection from [210.205.248.159] [210.205.248.159] Dec 12 07:29:36 castor clamav-milter[10261]: clamfi_connect: connection from [218.157.2.233] [218.157.2.233] Dec 12 07:29:36 castor clamav-milter[10264]: clamfi_connect: connection from 217-162-78-18.dclient.hispeed.ch [217.162.78.18] Dec 12 07:29:36 castor clamav-milter[10269]: clamfi_connect: connection from [210.205.248.159] [210.205.248.159] Dec 12 07:29:36 castor clamav-milter[10272]: clamfi_connect: connection from [61.43.251.223] [61.43.251.223] Dec 12 07:29:36 castor clamav-milter[10273]: clamfi_connect: connection from [61.43.251.223] [61.43.251.223] Dec 12 07:29:36 castor clamav-milter[10276]: clamfi_connect: connection from [219.232.182.235] [219.232.182.235] Dec 12 07:29:37 castor clamav-milter[10264]: hit max-children limit (5 >= 5): waiting for some to exit Dec 12 07:29:37 castor clamav-milter[10278]: clamfi_connect: connection from [218.149.61.87] [218.149.61.87] Dec 12 07:29:37 castor clamav-milter[10255]: hit max-children limit (5 >= 5): waiting for some to exit Normally, each "connection from" message is followed by the scan result - either "clean" or "virus found". The first odd thing in the logs is a long list of connects with no scan results (positive or negative) after them, then the max-children messages start. At this point, AFAICT, every mail is slowed down by sendmail waiting for the milter to time-out, and the system starts passing mail unscanned by ClamAV. Soon after, the "private data not NULL" errors start. Dec 12 07:30:20 castor clamav-milter[11018]: clamfi_connect: connection from vscan-a.ucl.ac.uk [144.82.100.150] Dec 12 07:30:20 castor clamav-milter[11018]: clean message from <[EMAIL PROTECTED]> Dec 12 07:30:37 castor clamav-milter[10255]: ClamAv: private data not NULL Dec 12 07:30:37 castor clamav-milter[10264]: ClamAv: private data not NULL Dec 12 07:30:38 castor clamav-milter[10273]: ClamAv: private data not NULL Dec 12 07:30:38 castor clamav-milter[10272]: ClamAv: private data not NULL Dec 12 07:30:38 castor clamav-milter[11028]: clamfi_connect: connection from vscan-a.ucl.ac.uk [144.82.100.150] Dec 12 07:30:38 castor clamav-milter[11028]: hit max-children limit (5 >= 5): waiting for some to exit Dec 12 07:30:39 castor clamav-milter[10289]: ClamAv: private data not NULL Dec 12 07:30:39 castor clamav-milter[10258]: ClamAv: private data not NULL Dec 12 07:30:39 castor clamav-milter[11031]: clamfi_connect: connection from vscan-b.ucl.ac.uk [144.82.100.151] Dec 12 07:30:39 castor clamav-milter[11031]: hit max-children limit (7 >= 5): waiting for some to exit It then accumulated processes until I notice, and kill and restart it. This morning, that was when I sent my first mail out at 10:15:- Dec 12 10:16:02 castor clamav-milter[4873]: hit max-children limit (351 >= 5): waiting for some to exit Dec 12 10:16:19 castor clamav-milter[4809]: ClamAv: private data not NULL Dec 12 10:16:26 castor clamav-milter[13004]: ClamAv: mi_stop=1 Something appears to have improved recently, beacuse with the 09122003 CVS, although I still get the huge max-children numbers in the logs, I don't actually seem to have hundreds of processes any more. So, I think that the self-check is actually where it starts to go wrong, and that clamd falls over somehow, which causes the milter to block and time-out, and also means that it doesn't always exit cleanly (leading to the not NULL errors). Mike. ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Clamav-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/clamav-users