Michelle Konzack wrote:
> SpamAssassin works already, but what must I do if I like to  use  ClamAV
> over network with 4-12 scanning machines?.

Hi Michelle,
a definite answer would require a better knowledge about your
environment. Also I'm not a courier-mta user.
However here are some generic suggestions that may help you.

First of all, ClamAV is generally faster and much less resource hungry
than SpamAssassin. The obvious choice is to set ClamAV first, SA next.

Second, avoid middleware generated overhead whenever possible. As an
example if your MTA can interface natively with SA and clam, then don't
use amavis. If it can't then just use amavis as a glue and disable all
its checks. Of course both suggestions imply that you don't care about
amavis functionalities and just use it as a glue.
Since I've discussed amavis, please also be aware that, under the most
common config, it will cause each message to be basically scanned twice:
each attachment separately first, then the full message (with all the
attachments). If you can just let clamav scan only the full message.

Third, carefully balance latency and performance. You can control the
number of scanning threads in clamd via the MaxThreads directive.
Performance wise, the optimal number of threads is something between N
and N*2 (with N+1 or N+2 being likely the absolute best) where N is the
total number of cpu cores. Please note however that when all the scan
threads are busy, further requests will be queued and possibly refused.
You certainly want to have enough threads available so that scan
requests from the mta are not refused or delayed for too long. At the
same time avoid an excessive amount of threads as this only wastes
resources.

Fourth, avoid IO as much as possible. Despite the fact that clamav
mostly bottlenecks on the cpu, disk IO can very badly impact the
performance of clamd in busy environments. Besides reading the files to
be checked, clamd may internally generate quite a few temporary files.
Under small load these files are very short lived and never really touch
the disk, hence no time is spent on IO. However, under heavy load, the
kernel may decide to actually commit them to the disk (or to the
journal) in order to free some memory. This increases iowait and
negatively affects the scan performance.
If you have the choice, pick a box with more ram and slower disks and
use tmpfs for the clamd tempdir and the mta (or amavis) scan spool (not
the mail spool directory!).


Back to your specific issue, clamd can scan streams from the network.
All you have to do is to set up a tcp socket instead of (or in addition
to) the unix socket.
Then you need a clamd client that can properly communicate to a remote
clamd. Since clamav-milter is not an option in your case, the most
obvious choice is probably clamdscan via a tiny courier perlfilter
script or via amavisd.

Finally if you have more clamd's than mta's then you may want to fairly
distribute (load balance and fail over) scan requests to all the
available scanners. Again you have several options here ranging from
writing a piece of perl filter to do manage the scan requests, to
routing mails to a second line of mta's (or amavisd's) in a (possibly
dns based) round robin fashion.


HtH,
--acab
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to