Dear Ian,

some time ago you wrote, in answer to one of my questions:

On 02/01/07 21:51, Rolf E. Sonneveld wrote:

Ian Abbott wrote:

On 02/01/07 12:12, Rolf E. Sonneveld wrote:

According to the monitoring system, the freshclam process disappeared between 14:29 and 14:34. Running ClamAV on Solaris 9. Any idea why after a 'connection refused' or 'connection timed out' the freshclam process dies?


It would be nice if there was an option to run freshclam as a "foreground daemon" so you could monitor its exit status, but there isn't. My guess is that it's receiving a signal whose current action is set to kill the process.

The signal handling for SIGALRM and SIGUSR1 in freshclam.c's main() function is a bit buggy. It sets the following actions in the main loop:

        sigaction(SIGALRM, &sigact, &oldact);
        sigaction(SIGUSR1, &sigact, &oldact);

then later on:

        sigaction(SIGALRM, &oldact, NULL);
        sigaction(SIGUSR1, &oldact, NULL);

There are two problems here. The two signals shouldn't really be using the same variable 'oldact', even though the default action for both signals is the same. The other problem is that the program spends some of its time with the SIGALRM and SIGUSR1 signals set to the default action, which is to terminate the process. In fact, the more I look at the main loop of the freshclam daemon, the worse it gets! It may catch SIGHUP and set the 'terminate' variable at the wrong time, causing the main loop to exit prematurely, or it may fail to catch 'SIGALRM' or 'SIGUSR1' some of the time, causing the process to terminate with that signal.


Thanks, Ian. This sounds interesting. If I understand you correctly, this can be related to the problem we see, with the disappearing freshclam daemon process? I'm not a programmer so I'm afraid I can't contribute code here; also, I'm not familiar with the way ClamAV changes/fixes are done. Is anyone in charge of the freshclam code?


It might be the problem, especially if you are sending a signal (SIGHUP) to the freshclam process from a log rotation script. If this occurs almost immediately after an internally generated SIGALRM, it could cause the main loop to terminate early, though that is extremely unlikely as the time window is very small. A far more likely cause is that the process is woken up by the SIGHUP and then the internally generated SIGALRM occurs later, killing the process. The program uses the default SIGALRM handler while it is doing all the network stuff, for example, so if the process is woken by an external SIGHUP, spends a lot of time doing network stuff, and receives the internally generated SIGALRM at this time, the process will be killed.

I'll mention my theory on the devel list, anyway.


Did you get any response on this issue on the development list? The problem still occurs now and then (occassionally, once every two or three weeks, without a pattern). Today I came in the office and found freshclam had died again. Logfile:

--------------------------------------
Received signal: wake up
ClamAV update process started at Thu Feb  8 04:03:52 2007
WARNING: Your ClamAV installation is OUTDATED!
WARNING: Local version: 0.88.6 Recommended version: 0.88.7
DON'T PANIC! Read http://www.clamav.net/faq.html
main.cvd is up to date (version: 42, sigs: 83951, f-level: 10, builder: tkojm) daily.cvd is up to date (version: 2533, sigs: 5388, f-level: 9, builder: sven)
--------------------------------------
Received signal: wake up
ClamAV update process started at Thu Feb  8 04:33:52 2007
WARNING: Your ClamAV installation is OUTDATED!
WARNING: Local version: 0.88.6 Recommended version: 0.88.7
DON'T PANIC! Read http://www.clamav.net/faq.html
main.cvd is up to date (version: 42, sigs: 83951, f-level: 10, builder: tkojm)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
connect_error: getsockopt(SO_ERROR): fd=0 error=145: Connection timed out

No core file found. Unfortunately, enabling Debug does not show timestamps.
Running:

-bash-3.00$ /opt/ClamAV/sbin/clamd -V
ClamAV 0.88.6/2534/Thu Feb  8 04:28:17 2007

The ClamAV mirror defined is:

bash-3.00# grep -i db /opt/ClamAV/etc/freshclam.conf
DatabaseMirror db.DE.clamav.net

We have seen the same problem when using db.NL.clamav.net. Looking at the availability figures for Germany (http://www.clamav.net/mirrors.html#de) it seems there has only been one server with a temp. failure tonight (which matches roughly the time the problem occurred).

What does freshclam daemon do:

a) do one DNS lookup (find multiple A reocrds), and after the first host fails, take the second host and so on.
b) perform a DNS lookup after each failed connection

In case a) I can't understand why freshclam would fail seven times, except when there has been a network problem for this host (there wasn't). In case b) it is possible that the system each time gets the same IP address (depends on the DNS client library and the way the results are sorted).

FYI, the system on which ClamAV is running is a Solaris 10 system. I hope there will be a fix for this in the next release.

Regards,
/rolf
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html

Reply via email to