I have been using SpamAssassin in my hosting environment for several
years. It catches thousands of spam messages (thank you...). But my
concern is that it doesn't catch a couple of hundred messages per day.
I have the Bayesian filter working, with a simple way to train it. I
have sent over 5000 training messages to it over the past 6-8 months. I
have set up a non-forwarding caching DNS, and the black list tests are
working.
My question is with the scoring. I understand the general theory of
adding up 'votes' by all of the spam tests to determine if it's indeed
spam. But it appears that no one test, no matter how certain it is, has
enough power to qualify the message as spam. The Bayesian filter can say
it's 80-100% certain it's spam. But some other test decides it's not
and even sometimes has a negative number that subtracts the Bayesian
score from the total. But my biggest problem is that even if it's
scored as coming from a BL URL, but if Bayesian doesn't also say it's
spam, then it's apparently still not spam. I spend a couple of hours
every day trying to tell the Bayesian filter about today's new strains
of spam that it hasn't yet seen.
Am I missing something obvious? Is this just the way it works, and I
should expect to have to run a couple of hundred missed spams through
the Bayesian filter each day? My threshold score was originally set to
5.0. I don't even remember where that came from. I dropped it to 4.0 a
couple of years ago, and that's where it is now. But (see example
output below) when BL says it's spam and adds 2.5, then Bayesian says
it's 40-60% spam and adds 0.8, and it's got a small font and gets
another 0.5, and all other tests are neutral... it's now 3.8 and STILL
not spam with a threshold of 4.0.
Can someone tell me if this is by design and/or if my configuration
should be adjusted? I realize I can easily drop the threshold to 1.0 or
2.0. But that would probably just shift the problem to tons of
false-positives which obviously is not a good solution.
Thx.
Jerry
X-SpamAssassin_121: Content analysis details: (3.8 points, 4.0 required)
X-SpamAssassin_122:
X-SpamAssassin_123: pts rule name description
X-SpamAssassin_124: ---- ----------------------
--------------------------------------------------
X-SpamAssassin_125: 2.5 URIBL_DBL_SPAM Contains a spam URL
listed in the DBL blocklist
X-SpamAssassin_126: [URIs: lspdiscover.com]
X-SpamAssassin_127: 0.0 TVD_RCVD_IP4 Message was received
from an IPv4 address
X-SpamAssassin_128: 0.0 TVD_RCVD_IP Message was received
from an IP address
X-SpamAssassin_129: -0.0 SPF_HELO_PASS SPF: HELO matches SPF
record
X-SpamAssassin_130: 0.8 BAYES_50 BODY: Bayes spam
probability is 40 to 60%
X-SpamAssassin_131: [score: 0.5013]
X-SpamAssassin_132: 0.0 HTML_MESSAGE BODY: HTML included in
message
X-SpamAssassin_133: 0.5 JAM_SMALL_FONT_SIZE RAW: Body of mail
contains parts with very small
X-SpamAssassin_134: font
X-SpamAssassin_135: 0.0 T_REMOTE_IMAGE Message contains an
external image
X-SpamAssassin_136:
X-SpamAssassin_999: --
X-Spam-Flag: NO
X-Spam-Status: No, hits=3.8 required=4.0
X-MessageIsSpamProbability: 0.0022784550478299674