Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

Matus UHLAR - fantomas Tue, 31 Oct 2017 13:06:06 -0700

On 31.10.17 01:35, David Gessel wrote:

amavisd-new-2.11.0_2,1
I'm finding the command /usr/local/bin/sa-learn --spam --showdots
/mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to

if you use amavis, you must train amavis' bayes database
(/var/lib/amavis/.spamassassin/ here), not your own.


On 31.10.17 14:27, David Gessel wrote:

huh, I was getting bayes filter results, as I =think= I'm training a global 
bayes database per
https://wiki.apache.org/spamassassin/SiteWideBayesSetup


that is quite dangerous setup if anyone has access to your system, and also
useless when you use amavis.

I have trained my DB years ago and I rarely need new training now.


Yes, I do understand that.  The cron jobs I set up quite some time ago
# learn ham and spam
17      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
--no-sync /mail/blackrosetech.com/gessel/.archives.2017/{cur,new}
22      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
--no-sync /mail/blackrosetech.com/gessel/.Sent/{cur,new}
27      3       *       *       0       root  /usr/local/bin/sa-learn --spam 
--no-sync /mail/blackrosetech.com/gessel/.ManJunk/{cur,new}
22      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
--no-sync /mail/blackrosetech.com/carolyn/.Archives.2017/{cur,new}
32      3       *       *       0       root  /usr/local/bin/sa-learn --spam 
--no-sync /mail/blackrosetech.com/carolyn/.ManJunk/{cur,new}
37      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
--no-sync /mail/blackrosetech.com/carolyn/.Sent/{cur,new}
55      3       *       *       0       root  /usr/local/bin/sa-learn --sync


that will kill your machine each night. unnecessarily.
but if really needed, I'd run them sequentially from one script.

I disabled auto-learn because non-spam would occasionally get through to
spam and I didn't want to train on that.  The theory here was to wipe the
database,


not needed, re-training helps very fast usually. re-training false positives
(especially those market autolearn=spam) and false negatives (autolearn=ham)
is much better.

but then 24 hours later...

# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham


are you sure someone did not back up your spam DB


Aside from the cron jobs above, no, but if they did that, then yes.


no idea how can it get lost then... maybe concurrent writes from the scripts
above?
But I think Berkeley DB should be resistant against this.

Would something like specifying the mailbox format also help?


only if you use mbox format.


No, maildir.  Not really relevant (I don't think) but:

dovecot2-2.2.31_1


dovecot's antspam plugin could fix your problems

https://wiki2.dovecot.org/Plugins/Antispam

your users would maintain the SA DB themselves.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
(R)etry, (A)bort, (C)ancer

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

Reply via email to