On 2011-08-01 9:52, monolit939 wrote:
Axb wrote:
wrong!
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt
see "bayes_path"
in your case:
bayes_path /var/mail/.spamassassin/bayes
Hello,
firstly, I have to thank for your advices. I added bayes_path
/var/mail/.spamassassin/bayes to local.cf. I used steps you recommneded in
previous post , BUT I performed them as user root. I think, that conversion
from Berkeley DB to SDBM was successful. Unfortunatelly Spamassassin gives
the same results with Berkeley DB and SDBM.
I am not sure if Spamassassin really uses the SDBM database during scannin
mails. I performed the following as root:
1) stop spamd
2) sa-learn --backup> /tmp/bayes_export
3) add the following lines to local.cf
bayes_store_module Mail::SpamAssassin::BayesStore::SDBM
bayes_path /var/mail/.spamassassin/bayes
4) sa-learn --restore /tmp/bayes_export
test change:
5) spamassassin -D --lint 2>&1 | grep -i bayes # I didnt notice any error
Jul 31 19:53:39.813 [2485] dbg: config: read file
/usr/share/spamassassin/23_bayes.cf
Jul 31 19:53:39.887 [2485] dbg: plugin: loading
Mail::SpamAssassin::Plugin::Bayes from @INC
Jul 31 19:53:40.688 [2485] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements 'learner_new',
priority 0
Jul 31 19:53:40.688 [2485] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0),
bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM
Jul 31 19:53:40.702 [2485] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::SDBM=HASH(0xb167590)
Jul 31 19:53:40.702 [2485] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements
'learner_is_scan_available', priority 0
Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O
/var/mail/.spamassassin/bayes_toks
Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O
/var/mail/.spamassassin/bayes_seen
Jul 31 19:53:40.703 [2485] dbg: bayes: found bayes db version 3
Jul 31 19:53:40.703 [2485] dbg: bayes: DB journal sync: last sync: 0
Jul 31 19:53:40.729 [2485] dbg: bayes: DB journal sync: last sync: 0
Jul 31 19:53:40.730 [2485] dbg: bayes: corpus size: nspam = 311537, nham =
240966
Jul 31 19:53:40.734 [2485] dbg: bayes: score = 0.468256978075479
Jul 31 19:53:40.735 [2485] dbg: bayes: DB expiry: tokens in DB: 118976,
Expiry max size: 150000, Oldest atime: 1255330288, Newest atime: 1266342672,
Last expire: 0, Current time: 1312134820
Jul 31 19:53:40.735 [2485] dbg: bayes: DB journal sync: last sync: 0
Jul 31 19:53:40.745 [2485] dbg: bayes: untie-ing
Jul 31 19:53:41.074 [2485] dbg: rules: ran eval rule BAYES_50 ======> got
hit (1)
Jul 31 19:53:41.135 [2485] dbg: check:
tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Jul 31 19:53:41.136 [2485] dbg: timing: total 1327 ms - init: 896 (67.5%),
parse: 0.71 (0.1%), extract_message_metadata: 1.30 (0.1%),
get_uri_detail_list: 1.11 (0.1%), tests_pri_-1000: 8 (0.6%), compile_gen:
151 (11.4%), compile_eval: 17 (1.3%), tests_pri_-950: 5 (0.3%),
tests_pri_-900: 5 (0.4%), tests_pri_-400: 21 (1.6%), check_bayes: 16 (1.2%),
tests_pri_0: 337 (25.4%), tests_pri_500: 51 (3.8%)
if you see no errors
6) restart spamd
7) ls -lh /var/mail/.spamassassin/*
-rw-r--r-- 1 mail root 12K 2010-02-16 19:39
/var/mail/.spamassassin/auto-whitelist
-rw-r--r-- 1 mail root 6 2010-02-16 19:39
/var/mail/.spamassassin/auto-whitelist.mutex
-rw-r--r-- 1 mail root 2.7K 2011-07-31 19:53
/var/mail/.spamassassin/bayes_journal
-rw-rw-r-- 1 mail root 3.8K 2011-07-31 19:50
/var/mail/.spamassassin/bayes.mutex
-rw-r--r-- 1 mail root 78M 2010-02-09 12:40
/var/mail/.spamassassin/bayes_seen
-rw----r-- 1 root root 16K 2011-07-31 19:51
/var/mail/.spamassassin/bayes_seen.dir
-rw----r-- 1 root root 128M 2011-07-31 19:51
/var/mail/.spamassassin/bayes_seen.pag
-rw-r--r-- 1 mail root 5.1M 2010-02-16 18:51
/var/mail/.spamassassin/bayes_toks
-rw----r-- 1 root root 4.0K 2011-07-31 19:51
/var/mail/.spamassassin/bayes_toks.dir
-rw----r-- 1 root root 4.0M 2011-07-31 19:51
/var/mail/.spamassassin/bayes_toks.pag
-rw-r--r-- 1 mail root 1.2K 2010-02-09 10:20
/var/mail/.spamassassin/user_prefs
file /var/mail/.spamassassin/*
/var/mail/.spamassassin/auto-whitelist: Berkeley DB (Hash, version 8,
native byte-order)
/var/mail/.spamassassin/auto-whitelist.mutex: ASCII text
/var/mail/.spamassassin/bayes_journal: ASCII text
/var/mail/.spamassassin/bayes.mutex: ASCII text
/var/mail/.spamassassin/bayes_seen: Berkeley DB (Hash, version 8,
native byte-order)
/var/mail/.spamassassin/bayes_seen.dir: DOS executable (device driver)
for DOS
/var/mail/.spamassassin/bayes_seen.pag: data
/var/mail/.spamassassin/bayes_toks: Berkeley DB (Hash, version 9,
native byte-order)
/var/mail/.spamassassin/bayes_toks.dir: DOS executable (device driver)
for DOS
/var/mail/.spamassassin/bayes_toks.pag: data
/var/mail/.spamassassin/mnt: setgid directory
/var/mail/.spamassassin/ol: setgid directory
/var/mail/.spamassassin/user_prefs: ASCII English text
Finally I started this script:
#! /bin/bash
for i in $(ls /path/to/emails); do
spamc -c -s 10000000< $i
done
Results:
Scanning with Berkeley DB:
real 87m2.779s
user 0m16.881s
sys 0m33.826s
Scanning with SDBM:
real 86m32.543s
user 0m17.105s
sys 0m33.802s
As you can see the results are almost the same. I suspect spamassassin that
during the second test (with SDBM) used still Berkeley database.
Is any possibility how to find out, which kind of database Spamassassin
uses?
you're seeing it:
bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM
move away the old files (you don't need these anymore)
bayes_tokens
bayes_seen
bayes_journal
SDBM files are *.dir *.pkg