-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Ricardo,

Wednesday, August 20, 2003, 9:02:12 PM, you wrote:

RK> I'm wondering how it is possible to keep bayes "fresh" with
RK> both spam and ham, and understand how can SA do
RK> auto-learning?

RK> How exactly does SA automatically provide ham to bayes?

First SA scores email, using whatever tools and rules are specified in
your parameters. That's the score you see in the email headers when SA is
done.

SA then throws away any Bayes score and any Blacklist or Whitelist score,
and compares the remaining score against conservatively set thresholds.
If your score is higher than the auto-learn spam threshold (15 above the
normal spam threshold, I think), then SA says it's so very likely that
this is spam, it'll auto-learn it.  If the score is lower than the
auto-learn ham threshold (I think this is -2), then SA says it's so very
likely that this is ham, it'll auto-learn it.  SA will not auto-learn
anything between these ranges.

RK> I imagined sa-learn would have to be fed ham and spam
RK> manually, otherwise if it is done automatically, wouldn't it
RK> be erroneously counting spam as ham, in the case of
RK> false-negatives, and erroneously be taught ham when there
RK> are false-positives?

Yes, which is why these auto-learn thresholds are so much more
conservative than the spam-flagging thresholds.

RK> A major concern of mine is that bayes won't get fed ham
RK> adequately and then get out of whack. Can anyone explain
RK> what affects the efficiency of bayes?

If you do nothing (no manual learning other than correcting FPs and FNs),
bayes probably won't get out of whack, because of the special auto-learn
thresholds.

If you go through your spam and manually sa-learn almost all of the spam,
then yes, you may get out of balance, unless you also manually sa-learn a
corresponding amount of ham.  (I wouldn't worry about trying for a 50/50
mix, but if you feed say 300 spam a week into sa-learn, you should
probably try to feed at least 30 ham in that same period of time.)

Bob Menschel

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBP0VvWJebK8E4qh1HEQJ2fACdGbmfYwDr+9sic6QmQwDWxyV6OGEAn25s
oqJk3F0ZqAXYxeIcfSuPpYrC
=PW3V
-----END PGP SIGNATURE-----




-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to