-------- Original Message -------- Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: Matus UHLAR - fantomas <uh...@fantomas.sk> To: users@spamassassin.apache.org Date: Tue Oct 31 2017 23:05:23 GMT+0300 (AST) >>> On 31.10.17 01:35, David Gessel wrote: >>>> amavisd-new-2.11.0_2,1 >>>> I'm finding the command /usr/local/bin/sa-learn --spam --showdots >>>> /mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to > >>> if you use amavis, you must train amavis' bayes database >>> (/var/lib/amavis/.spamassassin/ here), not your own. > > On 31.10.17 14:27, David Gessel wrote: >> huh, I was getting bayes filter results, as I =think= I'm training a global >> bayes database per >> https://wiki.apache.org/spamassassin/SiteWideBayesSetup > > that is quite dangerous setup if anyone has access to your system, and also > useless when you use amavis. I'll have to review the amavis config again. I think perhaps it is redundant, the amavis setup was done some time ago and I think it was working, the site-wide mod to 0777 permissions was more recently in debugging and I think is just a mistake. The right answer is that just the amavis user is the owner of the bayes db, correct?
> >>> I have trained my DB years ago and I rarely need new training now. >> >> Yes, I do understand that. The cron jobs I set up quite some time ago >> # learn ham and spam >> 17 3 * * 0 root /usr/local/bin/sa-learn --ham >> --no-sync /mail/blackrosetech.com/gessel/.archives.2017/{cur,new} >> 22 3 * * 0 root /usr/local/bin/sa-learn --ham >> --no-sync /mail/blackrosetech.com/gessel/.Sent/{cur,new} >> 27 3 * * 0 root /usr/local/bin/sa-learn --spam >> --no-sync /mail/blackrosetech.com/gessel/.ManJunk/{cur,new} >> 22 3 * * 0 root /usr/local/bin/sa-learn --ham >> --no-sync /mail/blackrosetech.com/carolyn/.Archives.2017/{cur,new} >> 32 3 * * 0 root /usr/local/bin/sa-learn --spam >> --no-sync /mail/blackrosetech.com/carolyn/.ManJunk/{cur,new} >> 37 3 * * 0 root /usr/local/bin/sa-learn --ham >> --no-sync /mail/blackrosetech.com/carolyn/.Sent/{cur,new} >> 55 3 * * 0 root /usr/local/bin/sa-learn --sync > > that will kill your machine each night. unnecessarily. > but if really needed, I'd run them sequentially from one script. The thing is, and this may be a big hint, it doesn't kill the machine at all. It barely generates any load, 0.2 max or so. sa-learn is running at the moment at 0.0% of cpu, 0.2% of mem, total time 1.22.03 over 24 hours. I think something may be locking the process in some way - either db locks or something else. No? > >> I disabled auto-learn because non-spam would occasionally get through to >> spam and I didn't want to train on that. The theory here was to wipe the >> database, > > not needed, re-training helps very fast usually. re-training false positives > (especially those market autolearn=spam) and false negatives (autolearn=ham) > is much better. > >>>> but then 24 hours later... >>>> >>>> # sa-learn --dump magic >>>> 0.000 0 3 0 non-token data: bayes db version >>>> 0.000 0 0 0 non-token data: nspam >>>> 0.000 0 0 0 non-token data: nham >>> >>> are you sure someone did not back up your spam DB >> >> Aside from the cron jobs above, no, but if they did that, then yes. > > no idea how can it get lost then... maybe concurrent writes from the scripts > above? > But I think Berkeley DB should be resistant against this. > >>>> Would something like specifying the mailbox format also help? >>> >>> only if you use mbox format. >> >> No, maildir. Not really relevant (I don't think) but: >> >> dovecot2-2.2.31_1 > > dovecot's antspam plugin could fix your problems > > https://wiki2.dovecot.org/Plugins/Antispam > > your users would maintain the SA DB themselves. > This looks like a great plugin and I'd be happy to use it, but I don't know if it will help if sa-learn is so slow. Something definitely isn't right... I've done something dumb somewhere and I'm not sure what or where.