Amos wrote:
> Have to admit that we just experienced this on Solaris 10 x86 (AMD). I 
> killed and restarted clamd and the backup of incoming mail starting 
> flowing again. This is with 0.91.1. Build pretty simple:
> 
> ./configure \
>     --with-user=amavis \
>     --with-group=amavis \
>     --sysconfdir=/opt/clamav/etc \
>     --with-dbdir=/opt/clamav/db
> 
> That's it. Nothing fancy. Used gcc-3.4.5.

Starting with ClamAV 0.91.1, I am also seeing problems on Solaris.
The issue is not high load-- the issue is messages getting jammed
up in ClamAV and never fully scanned. I did not experience this
problem prior to ClamAV 0.91.1.

I am using postfix 2.3.2, clamsmtp 1.6, and clamav 0.91.1 on a Solaris 10
system with the following architecture:

SunOS somehostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T1000

The symptoms I'm seeing are as follows:

1. A message is accepted via postfix. The postfix instance hands off the
   message to clamsmtpd for virus scanning.

2. clamsmtpd never gets a response from clamav, because clamav has now
   wedged one of its threads attempting to scan the inbound message.
   clamd now shows 100% CPU usage.

3. Eventually, postfix will note a timeout, and will attempt to re-deliver
   the message from its 'active' queue. clamd now shows 200% CPU usage.
   This process will continue until clamsmtpd hits its maximum configured
   number of open clamd connections, which in my case is 16. clamsmtpd
   will begin to refuse additional connections from postfix when this
   occurs.

4. A giant backlog of e-mail quickly develops on my original postfix
   instance, until I shut it all down. clamd has to be kill -9'd;
   kill -15 is ignored. If clamsmtpd is given a kill -15, it will not
   shut down until clamd is kill -9'd.

If I fire everything back up, postfix will process all of its queue'd
messages except for the one (or more) that caused the original backlog.
This makes identifying the problematic messages exceptionally simple,
as they are left behind in $POSTFIX-SPOOL/active.

I have a set of a dozen messages that reliably cause this behavior.
I have two identical Solaris mail servers, each with identical software;
as one might expect, I get identical behavior with these problematic
messages. If I attempt to re-inject one of the problematic messages
on the other server, the exact same behavior (clamd usage +100%,
message never gets delivered, etc.) occurs.

This is 100% repeatable. I am very grateful to postfix for its graceful
queue handling in adverse situations, and to the overall throughput of
the mail system when it is working properly-- at its worst, I had a
backlog of 12,000 messages due to this denial of service condition. Once
the problem was discovered, it took postfix and clamav no more than five
minutes to process and deliver the entire backlog.

I've been using ClamAV successfully for nigh on a year now, with this
exact hardware and software setup, dutifully updating to each new ClamAV
point release. I'm fairly sure that our manifestation of this problem
is not due to our local setup.

Each ClamAV compile is done in the same way on our systems:

CFLAGS="-O3 -mcpu=ultrasparc3 -mvis" CPPFLAGS="-I/local/gmp/include" 
LDFLAGS="-L/local/gmp/lib -Wl,-rpath,/local/gmp/lib -Wl,-rpath,/local/gcc/lib" 
./configure
--prefix=/local/clamav./$VERSION --with-zlib=/usr

Version of gmp is 4.2.1.

The version of gcc is:

$ gcc -v
Reading specs from /local/gcc./3.4.6/lib/gcc/sparc-sun-solaris2.10/3.4.6/specs
Configured with: ./configure --prefix=/local/gcc./3.4.6
Thread model: posix
gcc version 3.4.6

Using GNU binutils 2.17.

I'd be happy to provide developers with whatever reasonable information
is at my disposal to help eliminate this denial of service condition.

--Kyle
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html

Reply via email to