Dear Ian,
some time ago you wrote, in answer to one of my questions:
On 02/01/07 21:51, Rolf E. Sonneveld wrote:
Ian Abbott wrote:
On 02/01/07 12:12, Rolf E. Sonneveld wrote:
According to the monitoring system, the freshclam process
disappeared between 14:29 and 14:34. Running ClamAV on Solaris 9.
Any idea why after a 'connection refused' or 'connection timed out'
the freshclam process dies?
It would be nice if there was an option to run freshclam as a
"foreground daemon" so you could monitor its exit status, but there
isn't. My guess is that it's receiving a signal whose current
action is set to kill the process.
The signal handling for SIGALRM and SIGUSR1 in freshclam.c's main()
function is a bit buggy. It sets the following actions in the main
loop:
sigaction(SIGALRM, &sigact, &oldact);
sigaction(SIGUSR1, &sigact, &oldact);
then later on:
sigaction(SIGALRM, &oldact, NULL);
sigaction(SIGUSR1, &oldact, NULL);
There are two problems here. The two signals shouldn't really be
using the same variable 'oldact', even though the default action for
both signals is the same. The other problem is that the program
spends some of its time with the SIGALRM and SIGUSR1 signals set to
the default action, which is to terminate the process. In fact, the
more I look at the main loop of the freshclam daemon, the worse it
gets! It may catch SIGHUP and set the 'terminate' variable at the
wrong time, causing the main loop to exit prematurely, or it may
fail to catch 'SIGALRM' or 'SIGUSR1' some of the time, causing the
process to terminate with that signal.
Thanks, Ian. This sounds interesting. If I understand you correctly,
this can be related to the problem we see, with the disappearing
freshclam daemon process? I'm not a programmer so I'm afraid I can't
contribute code here; also, I'm not familiar with the way ClamAV
changes/fixes are done. Is anyone in charge of the freshclam code?
It might be the problem, especially if you are sending a signal
(SIGHUP) to the freshclam process from a log rotation script. If this
occurs almost immediately after an internally generated SIGALRM, it
could cause the main loop to terminate early, though that is extremely
unlikely as the time window is very small. A far more likely cause is
that the process is woken up by the SIGHUP and then the internally
generated SIGALRM occurs later, killing the process. The program uses
the default SIGALRM handler while it is doing all the network stuff,
for example, so if the process is woken by an external SIGHUP, spends
a lot of time doing network stuff, and receives the internally
generated SIGALRM at this time, the process will be killed.
I'll mention my theory on the devel list, anyway.
Did you get any response on this issue on the development list? The
problem still occurs now and then (occassionally, once every two or
three weeks, without a pattern). Today I came in the office and found
freshclam had died again. Logfile:
--------------------------------------
Received signal: wake up
ClamAV update process started at Thu Feb 8 04:03:52 2007
WARNING: Your ClamAV installation is OUTDATED!
WARNING: Local version: 0.88.6 Recommended version: 0.88.7
DON'T PANIC! Read http://www.clamav.net/faq.html
main.cvd is up to date (version: 42, sigs: 83951, f-level: 10, builder:
tkojm)
daily.cvd is up to date (version: 2533, sigs: 5388, f-level: 9, builder:
sven)
--------------------------------------
Received signal: wake up
ClamAV update process started at Thu Feb 8 04:33:52 2007
WARNING: Your ClamAV installation is OUTDATED!
WARNING: Local version: 0.88.6 Recommended version: 0.88.7
DON'T PANIC! Read http://www.clamav.net/faq.html
main.cvd is up to date (version: 42, sigs: 83951, f-level: 10, builder:
tkojm)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
nonblock_connect: connect timing out (30 secs)
connect_error: getsockopt(SO_ERROR): fd=0 error=145: Connection timed out
No core file found. Unfortunately, enabling Debug does not show timestamps.
Running:
-bash-3.00$ /opt/ClamAV/sbin/clamd -V
ClamAV 0.88.6/2534/Thu Feb 8 04:28:17 2007
The ClamAV mirror defined is:
bash-3.00# grep -i db /opt/ClamAV/etc/freshclam.conf
DatabaseMirror db.DE.clamav.net
We have seen the same problem when using db.NL.clamav.net. Looking at
the availability figures for Germany
(http://www.clamav.net/mirrors.html#de) it seems there has only been one
server with a temp. failure tonight (which matches roughly the time the
problem occurred).
What does freshclam daemon do:
a) do one DNS lookup (find multiple A reocrds), and after the first host
fails, take the second host and so on.
b) perform a DNS lookup after each failed connection
In case a) I can't understand why freshclam would fail seven times,
except when there has been a network problem for this host (there
wasn't). In case b) it is possible that the system each time gets the
same IP address (depends on the DNS client library and the way the
results are sorted).
FYI, the system on which ClamAV is running is a Solaris 10 system. I
hope there will be a fix for this in the next release.
Regards,
/rolf
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html