Hi all, I've recently deployed clamav-milter/clamd on an existing server, processing several thousand messages per hour. The system:
FreeBSD earl.sasknow.net 4.9-RELEASE-p1 FreeBSD 4.9-RELEASE-p1 #33: Wed Jan 14 18:09:39 CST 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/EARL i386 clamav-0.65_3 The system has 512MB RAM and is a P3-1.0GHz, and has been rock-solid for over a year. Here are the relevant settings: clamd (no flags) clamav-milter --postmaster-only --local --outgoing --max-children=50 I'm using the default clamav.conf, with the exception of enabling StreamSaveToDisk, to scan archives. In /var/log/messages, the system works fine for hours at a time, but I've started seeing an alarming number (hundreds) of the following (long lines): Feb 1 23:53:13 earl /kernel: pid 47145 (clamav-milter), uid 0: exited on signal 11 (core dumped) Feb 1 23:53:47 earl /kernel: pid 47156 (clamav-milter), uid 0: exited on signal 11 (core dumped) Feb 1 23:53:53 earl /kernel: pid 47158 (clamav-milter), uid 0: exited on signal 11 (core dumped) Once this happens, the system basically grinds to a useless crawl. (See below for more description on that). So, I rebuilt clamav-milter with debug info and saw the following: Core was generated by `clamav-milter'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libmilter.so.2...done. Reading symbols from /usr/lib/libc_r.so.4...done. Reading symbols from /usr/libexec/ld-elf.so.1...done. #0 0x280c446e in fileno () from /usr/lib/libc_r.so.4 (gdb) bt #0 0x280c446e in fileno () from /usr/lib/libc_r.so.4 #1 0x280ac8aa in popen () from /usr/lib/libc_r.so.4 #2 0x804a8b4 in clamfi_eom (ctx=0x805b300) at clamav-milter.c:1246 #3 0x28070c0f in mi_clr_macros () from /usr/lib/libmilter.so.2 #4 0x28070100 in mi_engine () from /usr/lib/libmilter.so.2 #5 0x2806fd79 in mi_handle_session () from /usr/lib/libmilter.so.2 #6 0x2806f59e in mi_thread_handle_wrapper () from /usr/lib/libmilter.so.2 #7 0x2808f11c in _thread_start () from /usr/lib/libc_r.so.4 #8 0xbfabaffc in ?? () (gdb) clamd.log shows nothing, save hundreds of lines like this in between the successful SelfChecks: stream: Worm.SCO.A FOUND clamav-milter continues to work.. sort of.. but the remaining child processes seem to divide up all of the remaining CPU time (roughly 100% / number of children, usually 10-20 or so, all runnable, when this happens. Normally only 1-2 children are running), and load averages skyrocket from < 1.00, to 20.00 +. The system isn't swapping, and vmstat shows minimal IO and PFs. CPU is ~100% user. The system is initially responsive, but slows within a few minutes. Of course, due to the high load average, sendmail starts refusing connections with tempfail. So, it doesn't take a genius to realize that this is probably related to the enormous load created by the SCO worm(s)... but something more sinister is going on (infinite loop somewhere? busy wait deadlock?), because the remaining clamav-milter processes seem to spin madly, and only exit via crashing (as indicated in /var/log/messages), which causes the load to shoot up exponentially. When this happens, I have no shortage of coredumps for debugging. Where should I look next? Thanks, - Ryan ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Clamav-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/clamav-users