On Feb 18, 2005, at 4:12 AM, Trog wrote:

This really looks like you're running out of some resource. That accept
() failure is from the clamd primary socket. We will need to find out
what the error is. Please try this patch:

Hi Trog and Andy-

Thanks for your responses. I've just patched my sources as instructed. I don't know the clamav code anywhere nearly as well as you folks do, so please take this with a grain of salt, but my intuition tells me patch won't illuminate the primary problem. Here's why:

I think the accept() errors only happened back when my server was allowed to linger in its hung state long enough for resource exhaustion of some sort was allowed to take place (i.e. it stayed in the bad state for over an hour).

The monitoring software I installed after this started to happen restarted clamav twice since my last message to this list. The software (Eugene Kurmanin's ClamdMon) attempts to push EICAR through the server once every 10 minutes and if the server doesn't respond within a minute it returns a failure code that triggers a restart. I wasn't near a computer both times so I don't have any more details beyond what my logs show.

 In the first case the the clamd log reported:

Fri Feb 18 09:42:44 2005 -> /var/spool/exim/scan/1D29L6-0000FS-7X/1D29L6-0000FS-
7X.eml: Worm.SomeFool.P FOUND
Fri Feb 18 09:51:03 2005 -> +++ Started at Fri Feb 18 09:51:03 2005


The second case:

Fri Feb 18 10:01:09 2005 -> No stats for Database check - forcing reload
Fri Feb 18 10:11:03 2005 -> +++ Started at Fri Feb 18 10:11:03 2005

If we look at the exim log for the proximate time period, we see:

2005-02-18 09:48:57 1D29R5-0001nI-PD malware acl condition: clamd: connection to
127.0.0.1, port 3310 failed (Bad file number)


(and one of those for every message from that point on that tries to pass through the system at that point) I see no indication of all of these failed tries in my clamd.log. There are no accept() errors. Nothing else on the machine (e.g. the exim processes) lead me to believe it has become memory, process or swap starved.

My current thought is something jams in the server, sometimes having to do with database-related things, it largely continues to respond to incoming connections but never recovers. If my monitoring software didn't step in, it would continue until all resources of some sort are exhausted at which point it stop accepting network connections. Does this theory sound wacky?

Let's assume I'm around to catch this happening. What would you recommend I do to get more information? If I connect with gdb, are there specific commands you'd like me to run?

In the meantime, I'm going to work on a source build of clamd that uses the latest and greatest of all of the dependent libraries (libz, gmp, etc) in the hopes that helps.
-- dNb


_______________________________________________
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users

Reply via email to