Steven Stern wrote:
> It appears that you do not yet have enough spam and ham in your
> database to enable learning.  You need to use sa-learn to push some
> spam and ham through the system.

That's not quite correct. There are no "number of learned spam/ham" thresholds for autolearning; the threshold is a combination of a basic score (check the Mail::SpamAssassin::Conf man page for the defaults on your system - IIRC it's >12 for spam, <0.1 for ham) and a requirement that at least 3 points come from header rules, and 3 from body rules. Again, check your local man page for the specific details on your local install. (This doesn't seem to have changed since Bayes was introduced.)

The Bayes subsystem will not *return* a score until the "numer of messages" thresholds are passed - by default 200 each ham and spam.

Manual training is still highly recommended early on, to make sure you get *accurate* training. I've got a number of systems I paid fairly close attention to early on, when I upgraded to SA2.54 and introduced them to Bayes support. I've *never* had to wipe and retrain any of them. (I *do* get customer "missed-spam" reports that occasionally show BAYES_{00,01,10} scores, but that's pretty rare, and I feed those messages back ASAP to keep things on track. Checking those messages afterward usually shows BAYES_50 or better.)

Richard E. Bewley, Jr. wrote:
SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
      SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
      URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1

Richard, your system didn't autolearn this particular message because there weren't enough hits on header rules (UNPARSEABLE_RELAY is it, I think; network tests (eg, URIRBL*) are also ignored for determining which scoreset to use to decide whether to autolearn). The SARE rulesets look mostly at the message bodies IIRC.

(from man Mail::SpamAssassin::Conf)
    Note that certain tests are ignored when determining whether a
    message should be trained upon:

     - rules with tflags set to 'learn' (the Bayesian rules)
     - rules with tflags set to 'userconf' (user white/black-listing
       rules, etc)
     - rules with tflags set to 'noautolearn'

    Also note that auto-training occurs using scores from either
    scoreset 0 or 1, depending on what scoreset is used during message
    check.  It is likely that the message check and auto-train scores
    will be different.

-kgd

Reply via email to