> Hello, thanks for the post. Firstly, you are wrong about performance of my
> computer - I dont have supercomputer. I didnt run 10 000 000 messages
> through spamc/spamd. In fact the number is 100 000 000 and it means the max.
> size of message I run through spamc/spamd(notice that the number is behind
> -s parametr, s as SIZE). The result about 85 minutes is for about 17000
> messages (354MB). The average is 3,33 sec per message.

That number seems pretty high. I'm not experienced enough in the general 
deployment of SA to say anything definite, but can only contribute numbers and 
hints from our own system. We use amavisd-new which doesn't spawn SA but has it 
running all the time, thus saving lots of time in that area.

Amavisd/postfix/SA can be configured to offer a lot of parallelism and can thus 
take full advantage of available system resources. Currently we have 32 
parallel processes running on a rather small machine (2 cores, 3 GB RAM), and 
our average per message is around 1.5 second.

If you need to improve performance, I suggest you start looking at the machine. 
Do you have a lot of iowait? Faster disks or look at dividing access between 
multiple drives. Do you have swapping? More memory. Do you have constant high 
cpu usage? More CPUs.

Then start looking at the timing reports (I don't know if these are provided by 
SA or amavisd, so you might not have them in your setup). Each and every mail 
through the system has a timing report logged so you can see exactly how much 
time each step of the process took. It looks like this:

Aug  5 00:01:53 post amavis[30559]: (30559-07) TIMING-SA total 1438 ms - parse: 
1.60 (0.1%), extract_message_metadata: 35 (2.5%), get_uri_detail_list: 4 
(0.3%), tests_pri_-1000: 13 (0.9%), tests_pri_-950: 1.54 (0.1%), 
tests_pri_-900: 1.55 (0.1%), tests_pri_-400: 33 (2.3%), check_bayes: 31 (2.2%), 
tests_pri_0: 1280 (89.0%), check_dkim_adsp: 109 (7.6%), check_spf: 40 (2.8%), 
poll_dns_idle: 35 (2.4%), check_dcc: 525 (36.5%), check_razor2: 492 (34.2%), 
check_pyzor: 0.25 (0.0%), tests_pri_500: 28 (1.9%), learn: 23 (1.6%), 
get_report: 1.45 (0.1%)

Here you can see that check_dcc and check_razor2 are pretty expensive, because 
they have to query external servers. We are a low traffic site (less than 50k 
messages a day) and that's not a problem for us. But if you have a high volume 
of traffic and DNS lookup dependent tests takes a long time, you might consider 
adding a local DNS server to your setup. Look at 
http://www.spamtips.org/2011/07/spamassassin-why-run-your-own-dns.html for 
further information.


-- 
Lars

Reply via email to