I have been using SpamAssassin in my hosting environment for several years.  It catches thousands of spam messages (thank you...).  But my concern is that it doesn't catch a couple of hundred messages per day.  I have the Bayesian filter working, with a simple way to train it.  I have sent over 5000 training messages to it over the past 6-8 months. I have set up a non-forwarding caching DNS, and the black list tests are working.

My question is with the scoring.  I understand the general theory of adding up 'votes' by all of the spam tests to determine if it's indeed spam.  But it appears that no one test, no matter how certain it is, has enough power to qualify the message as spam. The Bayesian filter can say it's 80-100% certain it's spam.  But some other test decides it's not and even sometimes has a negative number that subtracts the Bayesian score from the total.  But my biggest problem is that even if it's scored as coming from a BL URL, but if Bayesian doesn't also say it's spam, then it's apparently still not spam.  I spend a couple of hours every day trying to tell the Bayesian filter about today's new strains of spam that it hasn't yet seen.

Am I missing something obvious?  Is this just the way it works, and I should expect to have to run a couple of hundred missed spams through the Bayesian filter each day?  My threshold score was originally set to 5.0.  I don't even remember where that came from.  I dropped it to 4.0 a couple of years ago, and that's where it is now.  But (see example output below) when BL says it's spam and adds 2.5, then Bayesian says it's 40-60% spam and adds 0.8, and it's got a small font and gets another 0.5, and all other tests are neutral... it's now 3.8 and STILL not spam with a threshold of 4.0.

Can someone tell me if this is by design and/or if my configuration should be adjusted?  I realize I can easily drop the threshold to 1.0 or 2.0.  But that would probably just shift the problem to tons of false-positives which obviously is not a good solution.

Thx.

Jerry

X-SpamAssassin_121: Content analysis details:   (3.8 points, 4.0 required)
X-SpamAssassin_122:
X-SpamAssassin_123:  pts rule name              description
X-SpamAssassin_124: ---- ---------------------- -------------------------------------------------- X-SpamAssassin_125:  2.5 URIBL_DBL_SPAM         Contains a spam URL listed in the DBL blocklist
X-SpamAssassin_126:                             [URIs: lspdiscover.com]
X-SpamAssassin_127:  0.0 TVD_RCVD_IP4           Message was received from an IPv4 address X-SpamAssassin_128:  0.0 TVD_RCVD_IP            Message was received from an IP address X-SpamAssassin_129: -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record X-SpamAssassin_130:  0.8 BAYES_50               BODY: Bayes spam probability is 40 to 60%
X-SpamAssassin_131:                             [score: 0.5013]
X-SpamAssassin_132:  0.0 HTML_MESSAGE           BODY: HTML included in message X-SpamAssassin_133:  0.5 JAM_SMALL_FONT_SIZE    RAW: Body of mail contains parts with very small
X-SpamAssassin_134:                             font
X-SpamAssassin_135:  0.0 T_REMOTE_IMAGE         Message contains an external image
X-SpamAssassin_136:
X-SpamAssassin_999: --
X-Spam-Flag: NO
X-Spam-Status: No, hits=3.8 required=4.0
X-MessageIsSpamProbability: 0.0022784550478299674

Reply via email to