On 21 Jul 2015, at 20:55, Roman Gelfand wrote:
It seems that if DKIM or SPF is verified, the bayesian learning
doesn't
matter.
Not so. Perhaps you need to refresh your understanding of what
SpamAssassin is. It is not a collection of binary switches, but rather a
scoring system consisting of rules which have various scores.
How much each rule matters is a local decision, subject to default
values
X-Spam-Status: No, score=3.6 required=5.0
tests=BAYES_99,BAYES_999,DKIM_SIGNED,
DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no
version=3.3.2
3.3.2 is rather obsolete, but I still have the defaultrules laying
about...
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score
BAYES_99 0 0 3.8 3.5
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score
BAYES_999 0 0 0.2 0.2
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score
DKIM_SIGNED 0.1
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score
DKIM_VALID -0.1
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score
DKIM_VALID_AU -0.1
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score
HTML_MESSAGE 0.001
/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:score
SPF_PASS -0.001
The arithmetic, assuming you allow network tests: The 2 Bayes rules (de
facto Bayes certitude of spaminess) only add up to 3.7. All of the DKIM
and SPF crap nets out to -0.101, vastly overstating their value in
making spam/ham decisions, which in fact is indistinguishable from zero
as independent rules. However, that remains a small mitigation relative
the Bayes rules, which are much more reliable but still subject to error
by their nature as statistically-derived values. This is consistent with
your shown score and a reasonable understanding of spam.
On the other hand, if you really trust your Bayes DB and have a
particular widespread flavor of spam hitting you, that precise set of
rules (including HTML_MESSAGE) makes an excellent 'meta' rule worth a
solid half point, and if you don't have a lot of non-spam marketing mail
that you get voluntarily, you can probably lower your threshold to 4.5
or maybe even 4. Try this first on a personal mail server, NOT on one
handling mail for a broad audience including people who can fire you
(until after you've analyzed the mail stream very carefully.)