Re: DKIM, SPF and Bayesian Learning

Bill Cole Tue, 21 Jul 2015 19:56:01 -0700

On 21 Jul 2015, at 20:55, Roman Gelfand wrote:

It seems that if DKIM or SPF is verified, the bayesian learningdoesn't
matter.

Not so. Perhaps you need to refresh your understanding of whatSpamAssassin is. It is not a collection of binary switches, but rather ascoring system consisting of rules which have various scores.

How much each rule matters is a local decision, subject to defaultvalues

X-Spam-Status: No, score=3.6 required=5.0tests=BAYES_99,BAYES_999,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=noversion=3.3.2

3.3.2 is rather obsolete, but I still have the defaultrules layingabout...

/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:scoreBAYES_99 0 0 3.8 3.5/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:scoreBAYES_999 0 0 0.2 0.2/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:scoreDKIM_SIGNED 0.1/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:scoreDKIM_VALID -0.1/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:scoreDKIM_VALID_AU -0.1/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:scoreHTML_MESSAGE 0.001/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf:scoreSPF_PASS -0.001

The arithmetic, assuming you allow network tests: The 2 Bayes rules (defacto Bayes certitude of spaminess) only add up to 3.7. All of the DKIMand SPF crap nets out to -0.101, vastly overstating their value inmaking spam/ham decisions, which in fact is indistinguishable from zeroas independent rules. However, that remains a small mitigation relativethe Bayes rules, which are much more reliable but still subject to errorby their nature as statistically-derived values. This is consistent withyour shown score and a reasonable understanding of spam.

On the other hand, if you really trust your Bayes DB and have aparticular widespread flavor of spam hitting you, that precise set ofrules (including HTML_MESSAGE) makes an excellent 'meta' rule worth asolid half point, and if you don't have a lot of non-spam marketing mailthat you get voluntarily, you can probably lower your threshold to 4.5or maybe even 4. Try this first on a personal mail server, NOT on onehandling mail for a broad audience including people who can fire you(until after you've analyzed the mail stream very carefully.)

Re: DKIM, SPF and Bayesian Learning

Reply via email to