Scoring Philosophy?

Jerry Malcolm Tue, 21 Nov 2017 13:02:16 -0800

I have been using SpamAssassin in my hosting environment for severalyears. It catches thousands of spam messages (thank you...). But myconcern is that it doesn't catch a couple of hundred messages per day. I have the Bayesian filter working, with a simple way to train it. Ihave sent over 5000 training messages to it over the past 6-8 months. Ihave set up a non-forwarding caching DNS, and the black list tests areworking.

My question is with the scoring. I understand the general theory ofadding up 'votes' by all of the spam tests to determine if it's indeedspam. But it appears that no one test, no matter how certain it is, hasenough power to qualify the message as spam. The Bayesian filter can sayit's 80-100% certain it's spam. But some other test decides it's notand even sometimes has a negative number that subtracts the Bayesianscore from the total. But my biggest problem is that even if it'sscored as coming from a BL URL, but if Bayesian doesn't also say it'sspam, then it's apparently still not spam. I spend a couple of hoursevery day trying to tell the Bayesian filter about today's new strainsof spam that it hasn't yet seen.

Am I missing something obvious? Is this just the way it works, and Ishould expect to have to run a couple of hundred missed spams throughthe Bayesian filter each day? My threshold score was originally set to5.0. I don't even remember where that came from. I dropped it to 4.0 acouple of years ago, and that's where it is now. But (see exampleoutput below) when BL says it's spam and adds 2.5, then Bayesian saysit's 40-60% spam and adds 0.8, and it's got a small font and getsanother 0.5, and all other tests are neutral... it's now 3.8 and STILLnot spam with a threshold of 4.0.

Can someone tell me if this is by design and/or if my configurationshould be adjusted? I realize I can easily drop the threshold to 1.0 or2.0. But that would probably just shift the problem to tons offalse-positives which obviously is not a good solution.


Thx.

Jerry

X-SpamAssassin_121: Content analysis details:   (3.8 points, 4.0 required)
X-SpamAssassin_122:
X-SpamAssassin_123:  pts rule name              description

X-SpamAssassin_124: ---- ------------------------------------------------------------------------X-SpamAssassin_125: 2.5 URIBL_DBL_SPAM Contains a spam URLlisted in the DBL blocklist

X-SpamAssassin_126:                             [URIs: lspdiscover.com]

X-SpamAssassin_127: 0.0 TVD_RCVD_IP4 Message was receivedfrom an IPv4 addressX-SpamAssassin_128: 0.0 TVD_RCVD_IP Message was receivedfrom an IP addressX-SpamAssassin_129: -0.0 SPF_HELO_PASS SPF: HELO matches SPFrecordX-SpamAssassin_130: 0.8 BAYES_50 BODY: Bayes spamprobability is 40 to 60%

X-SpamAssassin_131:                             [score: 0.5013]

X-SpamAssassin_132: 0.0 HTML_MESSAGE BODY: HTML included inmessageX-SpamAssassin_133: 0.5 JAM_SMALL_FONT_SIZE RAW: Body of mailcontains parts with very small

X-SpamAssassin_134:                             font

X-SpamAssassin_135: 0.0 T_REMOTE_IMAGE Message contains anexternal image

X-SpamAssassin_136:
X-SpamAssassin_999: --
X-Spam-Flag: NO
X-Spam-Status: No, hits=3.8 required=4.0
X-MessageIsSpamProbability: 0.0022784550478299674

Scoring Philosophy?

Reply via email to