Martin Gregorie-2 wrote:
> 
> On Mon, 2011-08-01 at 12:30 -0700, monolit wrote:
>> I tried to measure performance of Spamassassin by using SDBM databse,
>> because of improvement performance. This site 
>> http://wiki.apache.org/spamassassin/BayesBenchmarkResults
>> BayesBenchmarkResults  claims, that by using SDBM database instead of
>> Berkeley DB, Spamassassin will be three times faster. Thats why I did the
>> measurement. 
>> 
>> I expected when I converted database format from Berkeley DB to SDBM
>> improvement of performance (as the link claims). But the tests didnt show
>> that. So, now I dont know where is the problem.
>>
> If you have URIBL checks turned on I'd expect that the normal network
> delays for these will completely mask any performance difference you may
> get by swapping one fast database for another. Here are some numbers:
> 
> - the slowest single record Berkeley DB operation in a 2006 Oracle
>   benchmark (TDS no-sync writes with disk logs on a 2.0 GHz Windows
>   XP box) ran at 45,748 ops/sec, or 0.02 mSec per operation
> 
> - pinging www.spamhaus.org just now took 30 mS.
> 
> Now consider that URIBL lookups are generally slower than that, but as
> they are all asynchronous, to a first approximation the time taken to
> handle the lot is the time taken by the slowest URIBL. Lets assume that
> the longest URIBL lookup takes 30ms.  
> 
> Lets further assume that each spam message contains 100 Bayes tokens, in
> which case looking them up on Bayes would take 2 mSec, or 7% of the time
> needed to ping www.spamhaus.org. 
> 
> The impact if using a database thats 3 times faster? The time taken for
> the lookups is now 0.7 ms, and the time for ping + 100 lookups has
> changed from 32mS to 30.7mS - a reduction of 4%!
> 
> Now consider that: 
> - the slowest URIBL lookup will take a lot longer than 30 mS
> - we've entirely neglected the time taken by SA to scan a message
>   and run the regexes in the rules collection
> 
> IOW, in real life the speedup will be quite a lot less that the 4% I
> estimated.
> 
> You measured a speed up of 309 seconds in 87 minutes, or 0.6%, which,
> all things considered, seems about what I'd expect even if SDBM is
> really 3 times faster than Berkeley DB.
> 
> Running repeated tests on a fixed set of messages can tell you about the
> overall performance of SA, but very little about the time taken by any
> of its internal modules, and that's ignoring the falsified cache hit
> rate that you'll see if you run repeated tests on the same data set. I
> think you'd get better data by running a single test with diagnostics
> turned on and looking at the execution time of the various spamd
> components.
> 
> BTW, your results from running 10,000,000 messages through spamc/spamd
> give an elapsed average processing time of around 0.5 mS per message, a
> figure I find hard believe unless you're running a supercomputer.
> 
> Admittedly, my system is at the opposite end of the power scale. For
> comparison, after a few runs on my 500 message spam corpus, and so all
> caches in my box and those in various routers out on the 'net are likely
> to be full, I can get down to 800-900 mS per message on a 1.6GHz core
> Duo with 1GB RAM. My typical scan times, on an 866 MHz P3 box with 512MB
> RAM, range from  1.1 seconds to 48.5 seconds (averaging 3.4 seconds)
> over the last 2111 messages processed.
> 
> 

Hello, thanks for the post. Firstly, you are wrong about performance of my
computer - I dont have supercomputer. I didnt run 10 000 000 messages
through spamc/spamd. In fact the number is 100 000 000 and it means the max.
size of message I run through spamc/spamd(notice that the number is behind
-s parametr, s as SIZE). The result about 85 minutes is for about 17000
messages (354MB). The average is 3,33 sec per message.

The reason why I do the test is because my boss gave me a task - IMPROVE THE
SPAMASSASSIN PERFORMANCE. 
He saw the result of the test I 
http://wiki.apache.org/spamassassin/BayesBenchmarkResults posted posted,
here is  http://wiki.apache.org/spamassassin/BayesBenchmark explanation  of
single parts of the test.

So now, he wants me increase performance of our spamassassin.
-- 
View this message in context: 
http://old.nabble.com/Conversion-Spamassassin%28bayes%29-database-to-SDBM-tp32160172p32194013.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Reply via email to