Re: Bayes autolearn configuration

Kris Deugau Wed, 05 Jul 2006 08:21:58 -0700

Steven Stern wrote:
> It appears that you do not yet have enough spam and ham in your
> database to enable learning.  You need to use sa-learn to push some
> spam and ham through the system.

That's not quite correct. There are no "number of learned spam/ham"thresholds for autolearning; the threshold is a combination of a basicscore (check the Mail::SpamAssassin::Conf man page for the defaults onyour system - IIRC it's >12 for spam, <0.1 for ham) and a requirementthat at least 3 points come from header rules, and 3 from body rules.Again, check your local man page for the specific details on your localinstall. (This doesn't seem to have changed since Bayes was introduced.)

The Bayes subsystem will not *return* a score until the "numer ofmessages" thresholds are passed - by default 200 each ham and spam.

Manual training is still highly recommended early on, to make sure youget *accurate* training. I've got a number of systems I paid fairlyclose attention to early on, when I upgraded to SA2.54 and introducedthem to Bayes support. I've *never* had to wipe and retrain any ofthem. (I *do* get customer "missed-spam" reports that occasionally showBAYES_{00,01,10} scores, but that's pretty rare, and I feed thosemessages back ASAP to keep things on track. Checking those messagesafterward usually shows BAYES_50 or better.)

Richard E. Bewley, Jr. wrote:

SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
      SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
      URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1

Richard, your system didn't autolearn this particular message becausethere weren't enough hits on header rules (UNPARSEABLE_RELAY is it, Ithink; network tests (eg, URIRBL*) are also ignored for determiningwhich scoreset to use to decide whether to autolearn). The SARErulesets look mostly at the message bodies IIRC.


(from man Mail::SpamAssassin::Conf)
    Note that certain tests are ignored when determining whether a
    message should be trained upon:

     - rules with tflags set to 'learn' (the Bayesian rules)
     - rules with tflags set to 'userconf' (user white/black-listing
       rules, etc)
     - rules with tflags set to 'noautolearn'

    Also note that auto-training occurs using scores from either
    scoreset 0 or 1, depending on what scoreset is used during message
    check.  It is likely that the message check and auto-train scores
    will be different.

-kgd

Re: Bayes autolearn configuration

Reply via email to