Axb wrote: > > On 2011-08-01 16:50, monolit939 wrote: >> >> >> Axb wrote: >>> >>> On 2011-08-01 9:52, monolit939 wrote: >>>> >>>> >>>> Axb wrote: >>>>> >>>>> wrong! >>>>> >>>>> http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt >>>>> >>>>> see "bayes_path" >>>>> >>>>> in your case: >>>>> bayes_path /var/mail/.spamassassin/bayes >>>>> >>>> >>>> Hello, >>>> >>>> firstly, I have to thank for your advices. I added bayes_path >>>> /var/mail/.spamassassin/bayes to local.cf. I used steps you recommneded >>>> in >>>> previous post , BUT I performed them as user root. I think, that >>>> conversion >>>> from Berkeley DB to SDBM was successful. Unfortunatelly Spamassassin >>>> gives >>>> the same results with Berkeley DB and SDBM. >>>> >>>> I am not sure if Spamassassin really uses the SDBM database during >>>> scannin >>>> mails. I performed the following as root: >>>> >>>> 1) stop spamd >>>> 2) sa-learn --backup> /tmp/bayes_export >>>> 3) add the following lines to local.cf >>>> bayes_store_module Mail::SpamAssassin::BayesStore::SDBM >>>> bayes_path /var/mail/.spamassassin/bayes >>>> 4) sa-learn --restore /tmp/bayes_export >>>> >>>> test change: >>>> 5) spamassassin -D --lint 2>&1 | grep -i bayes # I didnt notice any >>>> error >>>> Jul 31 19:53:39.813 [2485] dbg: config: read file >>>> /usr/share/spamassassin/23_bayes.cf >>>> Jul 31 19:53:39.887 [2485] dbg: plugin: loading >>>> Mail::SpamAssassin::Plugin::Bayes from @INC >>>> Jul 31 19:53:40.688 [2485] dbg: plugin: >>>> Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements >>>> 'learner_new', >>>> priority 0 >>>> Jul 31 19:53:40.688 [2485] dbg: bayes: learner_new >>>> self=Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0), >>>> bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM >>>> Jul 31 19:53:40.702 [2485] dbg: bayes: learner_new: got >>>> store=Mail::SpamAssassin::BayesStore::SDBM=HASH(0xb167590) >>>> Jul 31 19:53:40.702 [2485] dbg: plugin: >>>> Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements >>>> 'learner_is_scan_available', priority 0 >>>> Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O >>>> /var/mail/.spamassassin/bayes_toks >>>> Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O >>>> /var/mail/.spamassassin/bayes_seen >>>> Jul 31 19:53:40.703 [2485] dbg: bayes: found bayes db version 3 >>>> Jul 31 19:53:40.703 [2485] dbg: bayes: DB journal sync: last sync: 0 >>>> Jul 31 19:53:40.729 [2485] dbg: bayes: DB journal sync: last sync: 0 >>>> Jul 31 19:53:40.730 [2485] dbg: bayes: corpus size: nspam = 311537, >>>> nham >>>> = >>>> 240966 >>>> Jul 31 19:53:40.734 [2485] dbg: bayes: score = 0.468256978075479 >>>> Jul 31 19:53:40.735 [2485] dbg: bayes: DB expiry: tokens in DB: 118976, >>>> Expiry max size: 150000, Oldest atime: 1255330288, Newest atime: >>>> 1266342672, >>>> Last expire: 0, Current time: 1312134820 >>>> Jul 31 19:53:40.735 [2485] dbg: bayes: DB journal sync: last sync: 0 >>>> Jul 31 19:53:40.745 [2485] dbg: bayes: untie-ing >>>> Jul 31 19:53:41.074 [2485] dbg: rules: ran eval rule BAYES_50 ======> >>>> got >>>> hit (1) >>>> Jul 31 19:53:41.135 [2485] dbg: check: >>>> tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS >>>> Jul 31 19:53:41.136 [2485] dbg: timing: total 1327 ms - init: 896 >>>> (67.5%), >>>> parse: 0.71 (0.1%), extract_message_metadata: 1.30 (0.1%), >>>> get_uri_detail_list: 1.11 (0.1%), tests_pri_-1000: 8 (0.6%), >>>> compile_gen: >>>> 151 (11.4%), compile_eval: 17 (1.3%), tests_pri_-950: 5 (0.3%), >>>> tests_pri_-900: 5 (0.4%), tests_pri_-400: 21 (1.6%), check_bayes: 16 >>>> (1.2%), >>>> tests_pri_0: 337 (25.4%), tests_pri_500: 51 (3.8%) >>>> if you see no errors >>>> 6) restart spamd >>>> 7) ls -lh /var/mail/.spamassassin/* >>>> -rw-r--r-- 1 mail root 12K 2010-02-16 19:39 >>>> /var/mail/.spamassassin/auto-whitelist >>>> -rw-r--r-- 1 mail root 6 2010-02-16 19:39 >>>> /var/mail/.spamassassin/auto-whitelist.mutex >>>> -rw-r--r-- 1 mail root 2.7K 2011-07-31 19:53 >>>> /var/mail/.spamassassin/bayes_journal >>>> -rw-rw-r-- 1 mail root 3.8K 2011-07-31 19:50 >>>> /var/mail/.spamassassin/bayes.mutex >>>> -rw-r--r-- 1 mail root 78M 2010-02-09 12:40 >>>> /var/mail/.spamassassin/bayes_seen >>>> -rw----r-- 1 root root 16K 2011-07-31 19:51 >>>> /var/mail/.spamassassin/bayes_seen.dir >>>> -rw----r-- 1 root root 128M 2011-07-31 19:51 >>>> /var/mail/.spamassassin/bayes_seen.pag >>>> -rw-r--r-- 1 mail root 5.1M 2010-02-16 18:51 >>>> /var/mail/.spamassassin/bayes_toks >>>> -rw----r-- 1 root root 4.0K 2011-07-31 19:51 >>>> /var/mail/.spamassassin/bayes_toks.dir >>>> -rw----r-- 1 root root 4.0M 2011-07-31 19:51 >>>> /var/mail/.spamassassin/bayes_toks.pag >>>> -rw-r--r-- 1 mail root 1.2K 2010-02-09 10:20 >>>> /var/mail/.spamassassin/user_prefs >>>> >>>> file /var/mail/.spamassassin/* >>>> /var/mail/.spamassassin/auto-whitelist: Berkeley DB (Hash, >>>> version >>>> 8, >>>> native byte-order) >>>> /var/mail/.spamassassin/auto-whitelist.mutex: ASCII text >>>> /var/mail/.spamassassin/bayes_journal: ASCII text >>>> /var/mail/.spamassassin/bayes.mutex: ASCII text >>>> /var/mail/.spamassassin/bayes_seen: Berkeley DB (Hash, >>>> version >>>> 8, >>>> native byte-order) >>>> /var/mail/.spamassassin/bayes_seen.dir: DOS executable (device >>>> driver) >>>> for DOS >>>> /var/mail/.spamassassin/bayes_seen.pag: data >>>> /var/mail/.spamassassin/bayes_toks: Berkeley DB (Hash, >>>> version >>>> 9, >>>> native byte-order) >>>> /var/mail/.spamassassin/bayes_toks.dir: DOS executable (device >>>> driver) >>>> for DOS >>>> /var/mail/.spamassassin/bayes_toks.pag: data >>>> /var/mail/.spamassassin/mnt: setgid directory >>>> /var/mail/.spamassassin/ol: setgid directory >>>> /var/mail/.spamassassin/user_prefs: ASCII English text >>>> >>>> >>>> Finally I started this script: >>>> #! /bin/bash >>>> >>>> for i in $(ls /path/to/emails); do >>>> spamc -c -s 10000000< $i >>>> done >>>> >>>> Results: >>>> Scanning with Berkeley DB: >>>> real 87m2.779s >>>> user 0m16.881s >>>> sys 0m33.826s >>>> >>>> Scanning with SDBM: >>>> real 86m32.543s >>>> user 0m17.105s >>>> sys 0m33.802s >>>> >>>> As you can see the results are almost the same. I suspect spamassassin >>>> that >>>> during the second test (with SDBM) used still Berkeley database. >>>> >>>> Is any possibility how to find out, which kind of database Spamassassin >>>> uses? >>> >>> you're seeing it: >>> bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM >>> >>> move away the old files (you don't need these anymore) >>> bayes_tokens >>> bayes_seen >>> bayes_journal >>> >>> SDBM files are *.dir *.pkg >>> >>> >> >> Hello, >> >> I am afraid that doesnt work too. What have I done? >> >> 1) remove old files as you recomended (have a look): >> /var/mail/.spamassassin# ls -la >> -rw----r-- 1 root root 16384 2011-07-31 19:51 bayes_seen.dir >> -rw----r-- 1 root root 134169600 2011-07-31 19:51 bayes_seen.pag >> -rw----r-- 1 root root 4096 2011-07-31 19:51 bayes_toks.dir >> -rw----r-- 1 root root 4194304 2011-07-31 19:51 bayes_toks.pag >> >> 2) stop spamassassin >> 3) start spamassassin >> 4) start the script >> #! /bin/bash >> for i in $(ls /path/to/emails); do >> spamc -c -s 10000000< $i >> done >> >> The results: >> real 84m55.472s >> user 0m17.145s >> sys 0m34.466s >> >> Unfortunatelly the results are the same like previous. It probably means, >> that Spamassassin still use the same type of database (Berkeley DB). >> >> Any idea what could be wrong? > > nothing seems wrong. > > I have no idea what you're trying to prove or measure. > Bayes on steroids? > > if whatever user runs your spamd can read/write to bayes then you're set. > > sa-learn --dump magic > will show you in what state your bayes DB is in. > > if you need more help, start by checking > http://spamassassin.apache.org/full/3.3.x/doc/ > > maybe someobody else can chip in and figure out what you need. > >
I tried to measure performance of Spamassassin by using SDBM databse, because of improvement performance. This site http://wiki.apache.org/spamassassin/BayesBenchmarkResults BayesBenchmarkResults claims, that by using SDBM database instead of Berkeley DB, Spamassassin will be three times faster. Thats why I did the measurement. I expected when I converted database format from Berkeley DB to SDBM improvement of performance (as the link claims). But the tests didnt show that. So, now I dont know where is the problem. -- View this message in context: http://old.nabble.com/Conversion-Spamassassin%28bayes%29-database-to-SDBM-tp32160172p32172509.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.