Ever since I "upgraded" to the 3.x series I've had a major jump
in spams that are getting through.
Initially my upgrade was to 3.02 as distributed in SuSE 9.3 and
my problems were related to old configuration files/options where
NONE of my spam was being tagged into the spam folder (i.e. the SPAM header
wasn't set in the subject, as my filtering system makes use of).
I've gotten all of the "lint" out of my config files, ported my old
DB to the new format, and even ran the learning mechanism over several
old "SPAM" archives (~150Mb) and current "HAM" input folders ~100Mb.
About 100 spams a day are getting through and requiring manual
processing with about 100/day being correctly filtered into the spam
folder. That's a huge drop in detected spams. I've tried dialing
down the threshold from the default to my previous 5, then to 4.8...
not wanting to be overly aggressive. But I'm wondering if the default
weightings for various tests have been changed between the 2.6x and 3.0x
series.
I note a new 3.1.0 release, but noticed no improvement going from 3.02
to 3.04.
It _seems_ like, maybe, some of the weightings of the various tests
changed which is throwing off the classifier. I'll see multiple instances
of various, identical spams going to different email addresses on my
server -- most often with "Subject: Re[<x>]:", where x=[0-9]. They are
the most numerous offenders as they'll come in to multiple accounts
at nearly the same time (or a few seconds apart). One copy of those
messages will result in duplicate spam being sent to several accounts,
and my multiple personalities, er, um, "users" :-), are getting annoyed
with me.
Also of note: "sa-learn" is MUCH slower in 3.0.x than it was in 2.6.x though
with the compiled "spamc" client, I can see that the processing of incoming
spam is handled with a lower load on the server.
One voice in my head says, screw it, stop your whining and go back to what
worked (2.6x), but another part of me says "3.x" is where the future is, and
if there is a problem in my setup, I should take the time to figure out
what the problem is and try to make it work.
Looking at a partial header of one note:
X-Spam-Report:
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
* [score: 1.0000]
----
Content was multi-part message in MIME format, same messsage in plain and
HTML text:
"<content deleted due to spamassassin filtering>"
------
Content involved advertising product to increasing one's
chance of producing offspring via chance encounters with receptive
female partners. Is 5.0 too high a default in 3.x, though I would have
expected it to count a little bit more for an HTML message...
Ooops another batch of 80+ just came in....SA tastes great, less filling!
re: first posting attempt:
<<< 552 spam score (9.1) exceeded threshold
<BZZZT!!! NO discussion involving things that look too much like spam
on a list to designed to talk about a tool to detect such spam>
And ironically, the irony of this restriction may never be known if this
note never makes it to the list...;-/.
Sigh,
Linda