Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

David Gessel Wed, 01 Nov 2017 02:51:08 -0700


-------- Original Message --------
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per 
token really, really slow or roughly normal?
From: Matus UHLAR - fantomas <uh...@fantomas.sk>
To: users@spamassassin.apache.org
Date: Tue Oct 31 2017 23:05:23 GMT+0300 (AST)
>>> On 31.10.17 01:35, David Gessel wrote:
>>>> amavisd-new-2.11.0_2,1
>>>> I'm finding the command /usr/local/bin/sa-learn --spam --showdots
>>>> /mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to
>
>>> if you use amavis, you must train amavis' bayes database
>>> (/var/lib/amavis/.spamassassin/ here), not your own.
>
> On 31.10.17 14:27, David Gessel wrote:
>> huh, I was getting bayes filter results, as I =think= I'm training a global 
>> bayes database per
>> https://wiki.apache.org/spamassassin/SiteWideBayesSetup
>
> that is quite dangerous setup if anyone has access to your system, and also
> useless when you use amavis.
I'll have to review the amavis config again.  I think perhaps it is redundant, 
the amavis setup was done some time ago and I think it was working, the 
site-wide mod to 0777 permissions was more recently in debugging and I think is 
just a mistake.  The right answer is that just the amavis user is the owner of 
the bayes db, correct?


>
>>> I have trained my DB years ago and I rarely need new training now.
>>
>> Yes, I do understand that.  The cron jobs I set up quite some time ago
>> # learn ham and spam
>> 17      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
>> --no-sync /mail/blackrosetech.com/gessel/.archives.2017/{cur,new}
>> 22      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
>> --no-sync /mail/blackrosetech.com/gessel/.Sent/{cur,new}
>> 27      3       *       *       0       root  /usr/local/bin/sa-learn --spam 
>> --no-sync /mail/blackrosetech.com/gessel/.ManJunk/{cur,new}
>> 22      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
>> --no-sync /mail/blackrosetech.com/carolyn/.Archives.2017/{cur,new}
>> 32      3       *       *       0       root  /usr/local/bin/sa-learn --spam 
>> --no-sync /mail/blackrosetech.com/carolyn/.ManJunk/{cur,new}
>> 37      3       *       *       0       root  /usr/local/bin/sa-learn --ham 
>> --no-sync /mail/blackrosetech.com/carolyn/.Sent/{cur,new}
>> 55      3       *       *       0       root  /usr/local/bin/sa-learn --sync
>
> that will kill your machine each night. unnecessarily.
> but if really needed, I'd run them sequentially from one script.

The thing is, and this may be a big hint, it doesn't kill the machine at all.  
It barely generates any load, 0.2 max or so. 
sa-learn is running at the moment at 0.0% of cpu, 0.2% of mem, total time 
1.22.03 over 24 hours. 

I think something may be locking the process in some way - either db locks or 
something else.  No?

>
>> I disabled auto-learn because non-spam would occasionally get through to
>> spam and I didn't want to train on that.  The theory here was to wipe the
>> database,
>
> not needed, re-training helps very fast usually. re-training false positives
> (especially those market autolearn=spam) and false negatives (autolearn=ham)
> is much better.
>
>>>> but then 24 hours later...
>>>>
>>>> # sa-learn --dump magic
>>>> 0.000          0          3          0  non-token data: bayes db version
>>>> 0.000          0          0          0  non-token data: nspam
>>>> 0.000          0          0          0  non-token data: nham
>>>
>>> are you sure someone did not back up your spam DB
>>
>> Aside from the cron jobs above, no, but if they did that, then yes.
>
> no idea how can it get lost then... maybe concurrent writes from the scripts
> above?
> But I think Berkeley DB should be resistant against this.
>
>>>> Would something like specifying the mailbox format also help?
>>>
>>> only if you use mbox format.
>>
>> No, maildir.  Not really relevant (I don't think) but:
>>
>> dovecot2-2.2.31_1
>
> dovecot's antspam plugin could fix your problems
>
> https://wiki2.dovecot.org/Plugins/Antispam
>
> your users would maintain the SA DB themselves.
>

This looks like a great plugin and I'd be happy to use it, but I don't know if 
it will help if sa-learn is so slow.  Something definitely isn't right...  I've 
done something dumb somewhere and I'm not sure what or where.

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

Reply via email to