Amos wrote: > Have to admit that we just experienced this on Solaris 10 x86 (AMD). I > killed and restarted clamd and the backup of incoming mail starting > flowing again. This is with 0.91.1. Build pretty simple: > > ./configure \ > --with-user=amavis \ > --with-group=amavis \ > --sysconfdir=/opt/clamav/etc \ > --with-dbdir=/opt/clamav/db > > That's it. Nothing fancy. Used gcc-3.4.5.
Starting with ClamAV 0.91.1, I am also seeing problems on Solaris. The issue is not high load-- the issue is messages getting jammed up in ClamAV and never fully scanned. I did not experience this problem prior to ClamAV 0.91.1. I am using postfix 2.3.2, clamsmtp 1.6, and clamav 0.91.1 on a Solaris 10 system with the following architecture: SunOS somehostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T1000 The symptoms I'm seeing are as follows: 1. A message is accepted via postfix. The postfix instance hands off the message to clamsmtpd for virus scanning. 2. clamsmtpd never gets a response from clamav, because clamav has now wedged one of its threads attempting to scan the inbound message. clamd now shows 100% CPU usage. 3. Eventually, postfix will note a timeout, and will attempt to re-deliver the message from its 'active' queue. clamd now shows 200% CPU usage. This process will continue until clamsmtpd hits its maximum configured number of open clamd connections, which in my case is 16. clamsmtpd will begin to refuse additional connections from postfix when this occurs. 4. A giant backlog of e-mail quickly develops on my original postfix instance, until I shut it all down. clamd has to be kill -9'd; kill -15 is ignored. If clamsmtpd is given a kill -15, it will not shut down until clamd is kill -9'd. If I fire everything back up, postfix will process all of its queue'd messages except for the one (or more) that caused the original backlog. This makes identifying the problematic messages exceptionally simple, as they are left behind in $POSTFIX-SPOOL/active. I have a set of a dozen messages that reliably cause this behavior. I have two identical Solaris mail servers, each with identical software; as one might expect, I get identical behavior with these problematic messages. If I attempt to re-inject one of the problematic messages on the other server, the exact same behavior (clamd usage +100%, message never gets delivered, etc.) occurs. This is 100% repeatable. I am very grateful to postfix for its graceful queue handling in adverse situations, and to the overall throughput of the mail system when it is working properly-- at its worst, I had a backlog of 12,000 messages due to this denial of service condition. Once the problem was discovered, it took postfix and clamav no more than five minutes to process and deliver the entire backlog. I've been using ClamAV successfully for nigh on a year now, with this exact hardware and software setup, dutifully updating to each new ClamAV point release. I'm fairly sure that our manifestation of this problem is not due to our local setup. Each ClamAV compile is done in the same way on our systems: CFLAGS="-O3 -mcpu=ultrasparc3 -mvis" CPPFLAGS="-I/local/gmp/include" LDFLAGS="-L/local/gmp/lib -Wl,-rpath,/local/gmp/lib -Wl,-rpath,/local/gcc/lib" ./configure --prefix=/local/clamav./$VERSION --with-zlib=/usr Version of gmp is 4.2.1. The version of gcc is: $ gcc -v Reading specs from /local/gcc./3.4.6/lib/gcc/sparc-sun-solaris2.10/3.4.6/specs Configured with: ./configure --prefix=/local/gcc./3.4.6 Thread model: posix gcc version 3.4.6 Using GNU binutils 2.17. I'd be happy to provide developers with whatever reasonable information is at my disposal to help eliminate this denial of service condition. --Kyle _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://lurker.clamav.net/list/clamav-users.html