Re: Bayes advanced questions

Matt Kettler Thu, 11 May 2006 11:00:53 -0700

Michael Monnerie wrote:
> On Donnerstag, 11. Mai 2006 08:06 Matt Kettler wrote:
>>> And tonights expiry for server #1:
>>> bayes: synced databases from journal in 11 seconds: 1968 unique
>>> entries (3059 total entries)
>> That's the journal sync, not the expiry part. The expiry part takes
>> much longer.
> 
> It comes from "sa-learn --force-expire --sync".


First, adding --sync is redundant. --force-expire implies --sync because it
would be foolish for SA to attempt expiry without syncing first.

It won't hurt anything, but it's redundant.


>How could I see when it 
> expires something? Could it be because the ntokens are still not 2 
> mio., that I don't have an expire?


It's almost certainly going to run an expire. I'm just pointing out that those
11 seconds are NOT a part of the expire. They're just how long the sync part 
took.

Your expiry will take much longer. On server 1 with such a large bayes DB it
could take 10 minutes or more.

That said, SA should report details of the expiry right after the sync...


# sa-learn --force-expire --sync
bayes: synced databases from journal in 1 seconds: 938 unique entries (986 
total ent
ries)
expired old bayes database entries in 118 seconds
214732 entries kept, 1312 deleted
token frequency: 1-occurrence tokens: 2.50%
token frequency: less than 8 occurrences: 71.89%





> 
> On http://wiki.apache.org/spamassassin/BayesForceExpire is says you 
> should stop SA before --force-expire, is that a must or a 
> recommendation? 

It's a recommendation. If SA is still running and is in the middle of
auto-learning, sa-learn will have to wait for it to finish before it can lock
the DB R/W.

The man page doesn't ask for it.
> 
>>>> score used is the score the message would have got if:
>>>>    bayes was disabled
>>>>    the AWL was disabled
>>>>    no userconf (ie:black/whitelists) rules were enabled.
>>> Thats good info which should be in the man page.
>> It is.. In SA 3.1.x it's in the docs for the autolearn threshold
>> plugin:
>>
>> http://spamassassin.apache.org/full/3.1.x/dist/doc/Mail_SpamAssassin_
>> Plugin_AutoLearnThreshold.html
> 
> Not really. No mentioning that bayes/awl/userconf are not counted.
> 
>> I looked into the code for SA 3.1.0's PerMsgStatus.pm  and
>> Plugin/AutoLearnThreshold.pm.
>>
>> The limitation is actually done by computing score of the bayes
>> rules, not the actual bayes percentage.
>>
>> Learning as ham will be inhibited if the score of the "learn" rules
>> (ie: bayes) totals more than +1.0.
>> Learning as spam will be inhibited if e score of the "learn" rules
>> (ie: bayes) totals less than -1.0.
>>
>> Note: by "learn" rules, I mean rules declared with the "learn" tflag,
>> which at this time is just bayes.
>>
>> So in SA 3.1.0, existing training ranking BAYES_00 and BAYES_05 will
>> inhibit spam learning.
>> BAYES_60 or higher will inhibit ham learning.
> 
> This is very good info and would be nice documenting in man/wiki. I 
> could update the wiki, but I don't believe I'm qualified enough.
> 
> For example, the man page says:
> * Also note that auto-learning occurs using scores from either scoreset
> * 0 or 1
> 
> But who except the devs knows what's scoreset 0 or 1?

It's in the manpage, under the description of the "score" keyword.

---
If four valid scores are listed, then the score that is used depends on how
SpamAssassin is being used. The first score is used when both Bayes and network
tests are disabled (score set 0). The second score is used when Bayes is
disabled, but network tests are enabled (score set 1). The third score is used
when Bayes is enabled and network tests are disabled (score set 2). The fourth
score is used when Bayes is enabled and network tests are enabled (score set 3).
---

> 
> For people with several MXs this is good info also:
> http://wiki.apache.org/spamassassin/BayesBitMe
> 
> It explains why 2nd MX often generate FPs.
> 
> mfg zmi

Re: Bayes advanced questions

Reply via email to