Axb wrote:
> 
> On 2011-08-01 16:50, monolit939 wrote:
>>
>>
>> Axb wrote:
>>>
>>> On 2011-08-01 9:52, monolit939 wrote:
>>>>
>>>>
>>>> Axb wrote:
>>>>>
>>>>> wrong!
>>>>>
>>>>> http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt
>>>>>
>>>>> see "bayes_path"
>>>>>
>>>>> in your case:
>>>>> bayes_path /var/mail/.spamassassin/bayes
>>>>>
>>>>
>>>> Hello,
>>>>
>>>> firstly, I have to thank for your advices. I added bayes_path
>>>> /var/mail/.spamassassin/bayes to local.cf. I used steps you recommneded
>>>> in
>>>> previous post , BUT I performed them as user root. I think, that
>>>> conversion
>>>> from Berkeley DB to SDBM was successful. Unfortunatelly Spamassassin
>>>> gives
>>>> the same results with Berkeley DB and SDBM.
>>>>
>>>> I am not sure if Spamassassin really uses the SDBM database during
>>>> scannin
>>>> mails. I performed the following as root:
>>>>
>>>> 1) stop spamd
>>>> 2) sa-learn --backup>   /tmp/bayes_export
>>>> 3) add the following lines to local.cf
>>>> bayes_store_module           Mail::SpamAssassin::BayesStore::SDBM
>>>> bayes_path /var/mail/.spamassassin/bayes
>>>> 4) sa-learn --restore /tmp/bayes_export
>>>>
>>>> test change:
>>>> 5) spamassassin -D --lint 2>&1 | grep -i bayes # I didnt notice any
>>>> error
>>>> Jul 31 19:53:39.813 [2485] dbg: config: read file
>>>> /usr/share/spamassassin/23_bayes.cf
>>>> Jul 31 19:53:39.887 [2485] dbg: plugin: loading
>>>> Mail::SpamAssassin::Plugin::Bayes from @INC
>>>> Jul 31 19:53:40.688 [2485] dbg: plugin:
>>>> Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements
>>>> 'learner_new',
>>>> priority 0
>>>> Jul 31 19:53:40.688 [2485] dbg: bayes: learner_new
>>>> self=Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0),
>>>> bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM
>>>> Jul 31 19:53:40.702 [2485] dbg: bayes: learner_new: got
>>>> store=Mail::SpamAssassin::BayesStore::SDBM=HASH(0xb167590)
>>>> Jul 31 19:53:40.702 [2485] dbg: plugin:
>>>> Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements
>>>> 'learner_is_scan_available', priority 0
>>>> Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O
>>>> /var/mail/.spamassassin/bayes_toks
>>>> Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O
>>>> /var/mail/.spamassassin/bayes_seen
>>>> Jul 31 19:53:40.703 [2485] dbg: bayes: found bayes db version 3
>>>> Jul 31 19:53:40.703 [2485] dbg: bayes: DB journal sync: last sync: 0
>>>> Jul 31 19:53:40.729 [2485] dbg: bayes: DB journal sync: last sync: 0
>>>> Jul 31 19:53:40.730 [2485] dbg: bayes: corpus size: nspam = 311537,
>>>> nham
>>>> =
>>>> 240966
>>>> Jul 31 19:53:40.734 [2485] dbg: bayes: score = 0.468256978075479
>>>> Jul 31 19:53:40.735 [2485] dbg: bayes: DB expiry: tokens in DB: 118976,
>>>> Expiry max size: 150000, Oldest atime: 1255330288, Newest atime:
>>>> 1266342672,
>>>> Last expire: 0, Current time: 1312134820
>>>> Jul 31 19:53:40.735 [2485] dbg: bayes: DB journal sync: last sync: 0
>>>> Jul 31 19:53:40.745 [2485] dbg: bayes: untie-ing
>>>> Jul 31 19:53:41.074 [2485] dbg: rules: ran eval rule BAYES_50 ======>
>>>> got
>>>> hit (1)
>>>> Jul 31 19:53:41.135 [2485] dbg: check:
>>>> tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
>>>> Jul 31 19:53:41.136 [2485] dbg: timing: total 1327 ms - init: 896
>>>> (67.5%),
>>>> parse: 0.71 (0.1%), extract_message_metadata: 1.30 (0.1%),
>>>> get_uri_detail_list: 1.11 (0.1%), tests_pri_-1000: 8 (0.6%),
>>>> compile_gen:
>>>> 151 (11.4%), compile_eval: 17 (1.3%), tests_pri_-950: 5 (0.3%),
>>>> tests_pri_-900: 5 (0.4%), tests_pri_-400: 21 (1.6%), check_bayes: 16
>>>> (1.2%),
>>>> tests_pri_0: 337 (25.4%), tests_pri_500: 51 (3.8%)
>>>> if you see no errors
>>>> 6) restart spamd
>>>> 7) ls -lh /var/mail/.spamassassin/*
>>>> -rw-r--r-- 1 mail root  12K 2010-02-16 19:39
>>>> /var/mail/.spamassassin/auto-whitelist
>>>> -rw-r--r-- 1 mail root    6 2010-02-16 19:39
>>>> /var/mail/.spamassassin/auto-whitelist.mutex
>>>> -rw-r--r-- 1 mail root 2.7K 2011-07-31 19:53
>>>> /var/mail/.spamassassin/bayes_journal
>>>> -rw-rw-r-- 1 mail root 3.8K 2011-07-31 19:50
>>>> /var/mail/.spamassassin/bayes.mutex
>>>> -rw-r--r-- 1 mail root  78M 2010-02-09 12:40
>>>> /var/mail/.spamassassin/bayes_seen
>>>> -rw----r-- 1 root root  16K 2011-07-31 19:51
>>>> /var/mail/.spamassassin/bayes_seen.dir
>>>> -rw----r-- 1 root root 128M 2011-07-31 19:51
>>>> /var/mail/.spamassassin/bayes_seen.pag
>>>> -rw-r--r-- 1 mail root 5.1M 2010-02-16 18:51
>>>> /var/mail/.spamassassin/bayes_toks
>>>> -rw----r-- 1 root root 4.0K 2011-07-31 19:51
>>>> /var/mail/.spamassassin/bayes_toks.dir
>>>> -rw----r-- 1 root root 4.0M 2011-07-31 19:51
>>>> /var/mail/.spamassassin/bayes_toks.pag
>>>> -rw-r--r-- 1 mail root 1.2K 2010-02-09 10:20
>>>> /var/mail/.spamassassin/user_prefs
>>>>
>>>> file /var/mail/.spamassassin/*
>>>> /var/mail/.spamassassin/auto-whitelist:       Berkeley DB (Hash,
>>>> version
>>>> 8,
>>>> native byte-order)
>>>> /var/mail/.spamassassin/auto-whitelist.mutex: ASCII text
>>>> /var/mail/.spamassassin/bayes_journal:        ASCII text
>>>> /var/mail/.spamassassin/bayes.mutex:          ASCII text
>>>> /var/mail/.spamassassin/bayes_seen:           Berkeley DB (Hash,
>>>> version
>>>> 8,
>>>> native byte-order)
>>>> /var/mail/.spamassassin/bayes_seen.dir:       DOS executable (device
>>>> driver)
>>>> for DOS
>>>> /var/mail/.spamassassin/bayes_seen.pag:       data
>>>> /var/mail/.spamassassin/bayes_toks:           Berkeley DB (Hash,
>>>> version
>>>> 9,
>>>> native byte-order)
>>>> /var/mail/.spamassassin/bayes_toks.dir:       DOS executable (device
>>>> driver)
>>>> for DOS
>>>> /var/mail/.spamassassin/bayes_toks.pag:       data
>>>> /var/mail/.spamassassin/mnt:                  setgid directory
>>>> /var/mail/.spamassassin/ol:                   setgid directory
>>>> /var/mail/.spamassassin/user_prefs:           ASCII English text
>>>>
>>>>
>>>> Finally I started this script:
>>>> #! /bin/bash
>>>>
>>>> for i in $(ls /path/to/emails); do
>>>>          spamc -c -s 10000000<   $i
>>>> done
>>>>
>>>> Results:
>>>> Scanning with Berkeley DB:
>>>> real       87m2.779s
>>>> user       0m16.881s
>>>> sys        0m33.826s
>>>>
>>>> Scanning with SDBM:
>>>> real       86m32.543s
>>>> user       0m17.105s
>>>> sys        0m33.802s
>>>>
>>>> As you can see the results are almost the same. I suspect spamassassin
>>>> that
>>>> during the second test (with SDBM) used still Berkeley database.
>>>>
>>>> Is any possibility how to find out, which kind of database Spamassassin
>>>> uses?
>>>
>>> you're seeing it:
>>> bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM
>>>
>>> move away the old files (you don't need these anymore)
>>> bayes_tokens
>>> bayes_seen
>>> bayes_journal
>>>
>>> SDBM files are *.dir *.pkg
>>>
>>>
>>
>> Hello,
>>
>> I am afraid that doesnt work too. What have I done?
>>
>> 1) remove old files as you recomended (have a look):
>> /var/mail/.spamassassin# ls -la
>> -rw----r-- 1 root root     16384 2011-07-31 19:51 bayes_seen.dir
>> -rw----r-- 1 root root 134169600 2011-07-31 19:51 bayes_seen.pag
>> -rw----r-- 1 root root      4096 2011-07-31 19:51 bayes_toks.dir
>> -rw----r-- 1 root root   4194304 2011-07-31 19:51 bayes_toks.pag
>>
>> 2) stop spamassassin
>> 3) start spamassassin
>> 4) start the script
>> #! /bin/bash
>> for i in $(ls /path/to/emails); do
>>           spamc -c -s 10000000<   $i
>> done
>>
>> The results:
>> real 84m55.472s
>> user 0m17.145s
>> sys  0m34.466s
>>
>> Unfortunatelly the results are the same like previous. It probably means,
>> that Spamassassin still use the same type of database (Berkeley DB).
>>
>> Any idea what could be wrong?
> 
> nothing seems wrong.
> 
> I have no idea what you're trying to prove or measure.
> Bayes on steroids?
> 
> if whatever user runs your spamd can read/write to bayes then you're set.
> 
> sa-learn --dump magic
> will show you in what state your bayes DB is in.
> 
> if you need more help, start by checking
> http://spamassassin.apache.org/full/3.3.x/doc/
> 
> maybe someobody else can chip in and figure out what you need.
> 
> 

I tried to measure performance of Spamassassin by using SDBM databse,
because of improvement performance. This site 
http://wiki.apache.org/spamassassin/BayesBenchmarkResults
BayesBenchmarkResults  claims, that by using SDBM database instead of
Berkeley DB, Spamassassin will be three times faster. Thats why I did the
measurement. 

I expected when I converted database format from Berkeley DB to SDBM
improvement of performance (as the link claims). But the tests didnt show
that. So, now I dont know where is the problem.
-- 
View this message in context: 
http://old.nabble.com/Conversion-Spamassassin%28bayes%29-database-to-SDBM-tp32160172p32172509.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Reply via email to