-------- Original Message --------
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: Matus UHLAR - fantomas <uh...@fantomas.sk>
To: users@spamassassin.apache.org
Date: Tue Oct 31 2017 23:05:23 GMT+0300 (AST)
>>> On 31.10.17 01:35, David Gessel wrote:
>>>> amavisd-new-2.11.0_2,1
>>>> I'm finding the command /usr/local/bin/sa-learn --spam --showdots
>>>> /mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to
>
>>> if you use amavis, you must train amavis' bayes database
>>> (/var/lib/amavis/.spamassassin/ here), not your own.
>
> On 31.10.17 14:27, David Gessel wrote:
>> huh, I was getting bayes filter results, as I =think= I'm training a global
>> bayes database per
>> https://wiki.apache.org/spamassassin/SiteWideBayesSetup
>
> that is quite dangerous setup if anyone has access to your system, and also
> useless when you use amavis.
I'll have to review the amavis config again. I think perhaps it is redundant,
the amavis setup was done some time ago and I think it was working, the
site-wide mod to 0777 permissions was more recently in debugging and I think is
just a mistake. The right answer is that just the amavis user is the owner of
the bayes db, correct?
>
>>> I have trained my DB years ago and I rarely need new training now.
>>
>> Yes, I do understand that. The cron jobs I set up quite some time ago
>> # learn ham and spam
>> 17 3 * * 0 root /usr/local/bin/sa-learn --ham
>> --no-sync /mail/blackrosetech.com/gessel/.archives.2017/{cur,new}
>> 22 3 * * 0 root /usr/local/bin/sa-learn --ham
>> --no-sync /mail/blackrosetech.com/gessel/.Sent/{cur,new}
>> 27 3 * * 0 root /usr/local/bin/sa-learn --spam
>> --no-sync /mail/blackrosetech.com/gessel/.ManJunk/{cur,new}
>> 22 3 * * 0 root /usr/local/bin/sa-learn --ham
>> --no-sync /mail/blackrosetech.com/carolyn/.Archives.2017/{cur,new}
>> 32 3 * * 0 root /usr/local/bin/sa-learn --spam
>> --no-sync /mail/blackrosetech.com/carolyn/.ManJunk/{cur,new}
>> 37 3 * * 0 root /usr/local/bin/sa-learn --ham
>> --no-sync /mail/blackrosetech.com/carolyn/.Sent/{cur,new}
>> 55 3 * * 0 root /usr/local/bin/sa-learn --sync
>
> that will kill your machine each night. unnecessarily.
> but if really needed, I'd run them sequentially from one script.
The thing is, and this may be a big hint, it doesn't kill the machine at all.
It barely generates any load, 0.2 max or so.
sa-learn is running at the moment at 0.0% of cpu, 0.2% of mem, total time
1.22.03 over 24 hours.
I think something may be locking the process in some way - either db locks or
something else. No?
>
>> I disabled auto-learn because non-spam would occasionally get through to
>> spam and I didn't want to train on that. The theory here was to wipe the
>> database,
>
> not needed, re-training helps very fast usually. re-training false positives
> (especially those market autolearn=spam) and false negatives (autolearn=ham)
> is much better.
>
>>>> but then 24 hours later...
>>>>
>>>> # sa-learn --dump magic
>>>> 0.000 0 3 0 non-token data: bayes db version
>>>> 0.000 0 0 0 non-token data: nspam
>>>> 0.000 0 0 0 non-token data: nham
>>>
>>> are you sure someone did not back up your spam DB
>>
>> Aside from the cron jobs above, no, but if they did that, then yes.
>
> no idea how can it get lost then... maybe concurrent writes from the scripts
> above?
> But I think Berkeley DB should be resistant against this.
>
>>>> Would something like specifying the mailbox format also help?
>>>
>>> only if you use mbox format.
>>
>> No, maildir. Not really relevant (I don't think) but:
>>
>> dovecot2-2.2.31_1
>
> dovecot's antspam plugin could fix your problems
>
> https://wiki2.dovecot.org/Plugins/Antispam
>
> your users would maintain the SA DB themselves.
>
This looks like a great plugin and I'd be happy to use it, but I don't know if
it will help if sa-learn is so slow. Something definitely isn't right... I've
done something dumb somewhere and I'm not sure what or where.