On 01/28/2015 04:38 PM, Reindl Harald wrote:
is AFAIK relevant in context of sa-learn to not re-train the same
messages again and again - and it has it's own bugs becaue for a few
messages it contains random parts of the message itself, fire sa-learn
on the whole corpus would add these messages each time to "bayes_toks"
see two example snippets below
hence it is that large here
-rw------- 1 sa-milt sa-milt 5,4K 2015-01-28 16:34 bayes_journal
-rw------- 1 sa-milt sa-milt 1,3M 2015-01-28 16:12 bayes_seen
-rw------- 1 sa-milt sa-milt 40M 2015-01-28 16:33 bayes_toks
-rw------- 1 sa-milt sa-milt 98 2014-08-21 17:47 user_prefs
_________________________________________________
something here does NOT make sense
1.3 MB of seen against 40MB tokens.
someone please correct me if I'm wrong:
afaik, this probably means you've deleted bayes_seen so bayes has lost
it's record of what it has processed so it will relearn stuff you
already fed it.
Also, a 40MB tokens DB file will not exactly help your speed.
if you don't want to use Redis then at least use SDBM which is way faster.
local.cf:
bayes_store_module Mail::SpamAssassin::BayesStore::SDBM
and restore/relearn your corpus