Steven Stern wrote:
> It appears that you do not yet have enough spam and ham in your
> database to enable learning. You need to use sa-learn to push some
> spam and ham through the system.
That's not quite correct. There are no "number of learned spam/ham"
thresholds for autolearning; the threshold is a combination of a basic
score (check the Mail::SpamAssassin::Conf man page for the defaults on
your system - IIRC it's >12 for spam, <0.1 for ham) and a requirement
that at least 3 points come from header rules, and 3 from body rules.
Again, check your local man page for the specific details on your local
install. (This doesn't seem to have changed since Bayes was introduced.)
The Bayes subsystem will not *return* a score until the "numer of
messages" thresholds are passed - by default 200 each ham and spam.
Manual training is still highly recommended early on, to make sure you
get *accurate* training. I've got a number of systems I paid fairly
close attention to early on, when I upgraded to SA2.54 and introduced
them to Bayes support. I've *never* had to wipe and retrain any of
them. (I *do* get customer "missed-spam" reports that occasionally show
BAYES_{00,01,10} scores, but that's pretty rare, and I feed those
messages back ASAP to keep things on track. Checking those messages
afterward usually shows BAYES_50 or better.)
Richard E. Bewley, Jr. wrote:
SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
Richard, your system didn't autolearn this particular message because
there weren't enough hits on header rules (UNPARSEABLE_RELAY is it, I
think; network tests (eg, URIRBL*) are also ignored for determining
which scoreset to use to decide whether to autolearn). The SARE
rulesets look mostly at the message bodies IIRC.
(from man Mail::SpamAssassin::Conf)
Note that certain tests are ignored when determining whether a
message should be trained upon:
- rules with tflags set to 'learn' (the Bayesian rules)
- rules with tflags set to 'userconf' (user white/black-listing
rules, etc)
- rules with tflags set to 'noautolearn'
Also note that auto-training occurs using scores from either
scoreset 0 or 1, depending on what scoreset is used during message
check. It is likely that the message check and auto-train scores
will be different.
-kgd