Re: Bayes advanced questions

Michael Monnerie Thu, 11 May 2006 09:17:51 -0700

On Donnerstag, 11. Mai 2006 08:06 Matt Kettler wrote:
> > And tonights expiry for server #1:
> > bayes: synced databases from journal in 11 seconds: 1968 unique
> > entries (3059 total entries)
> That's the journal sync, not the expiry part. The expiry part takes
> much longer.


It comes from "sa-learn --force-expire --sync". How could I see when it 
expires something? Could it be because the ntokens are still not 2 
mio., that I don't have an expire?

On http://wiki.apache.org/spamassassin/BayesForceExpire is says you 
should stop SA before --force-expire, is that a must or a 
recommendation? The man page doesn't ask for it.

> >> score used is the score the message would have got if:
> >>    bayes was disabled
> >>    the AWL was disabled
> >>    no userconf (ie:black/whitelists) rules were enabled.
> >
> > Thats good info which should be in the man page.
>
> It is.. In SA 3.1.x it's in the docs for the autolearn threshold
> plugin:
>
> http://spamassassin.apache.org/full/3.1.x/dist/doc/Mail_SpamAssassin_
>Plugin_AutoLearnThreshold.html

Not really. No mentioning that bayes/awl/userconf are not counted.

> I looked into the code for SA 3.1.0's PerMsgStatus.pm  and
> Plugin/AutoLearnThreshold.pm.
>
> The limitation is actually done by computing score of the bayes
> rules, not the actual bayes percentage.
>
> Learning as ham will be inhibited if the score of the "learn" rules
> (ie: bayes) totals more than +1.0.
> Learning as spam will be inhibited if e score of the "learn" rules
> (ie: bayes) totals less than -1.0.
>
> Note: by "learn" rules, I mean rules declared with the "learn" tflag,
> which at this time is just bayes.
>
> So in SA 3.1.0, existing training ranking BAYES_00 and BAYES_05 will
> inhibit spam learning.
> BAYES_60 or higher will inhibit ham learning.

This is very good info and would be nice documenting in man/wiki. I 
could update the wiki, but I don't believe I'm qualified enough.

For example, the man page says:
* Also note that auto-learning occurs using scores from either scoreset
* 0 or 1

But who except the devs knows what's scoreset 0 or 1?

For people with several MXs this is good info also:
http://wiki.apache.org/spamassassin/BayesBitMe

It explains why 2nd MX often generate FPs.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660/4156531                          .network.your.ideas.
// PGP Key:   "lynx -source http://zmi.at/zmi3.asc | gpg --import"
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net                 Key-ID: 0x55CBA4EE

pgpx2Ifrj70K1.pgp
Description: PGP signature

Re: Bayes advanced questions

Reply via email to