Re: it's getting worse again

Kris Deugau 6 Apr 2005 19:53:50 -0000

Florin Andrei wrote:
> I'm using SA since... well, a long time ago, and one thing that i
> noticed was a pattern in the way its efficiency varies: it's pretty
> good soon after a new release, then it gets continuously worse; then
> a new release and all of a sudden it's good again, then it starts
> "decaying" again...


I noticed this for several releases up to the 2.4x series; and to a
lesser degree into 2.5x and 2.6x.  However, I've reached a fairly stable
state with 2.64 (with the SpamCopURI "plugin"/patch) where I see maybe
two or three spams a week slipping through - at most.  I move those
messages to a "missed-spam" folder, and sa-learn that folder manually
every so often.

Bayes and the SURBL checks have *REALLY* made a noticeable difference in
long-term accuracy.  If it weren't for SURBL alone, actually, I probably
would have upgraded to 3.x by now.  I also happen to maintain a
"local-use-only" DNS zone that I refer to with the SURBL check;  but I
haven't added anything to it in several months.

> Well, it's been a while since the last release, and it's already
> noticeably worse. I know this has been discussed before, i am aware
> of the VirusScannerTypeUpdates FAQ entry, but you know what, from an
> end-user's point of view, it does not matter. All that matters is
> that, despite brilliant technical discussions, the efficiency is
> going down and, if a new version is not released soon enough, the
> users start to complain. This is what's happening right now.

This WILL HAPPEN if you rely entirely on static rules - spammers adjust
their tactics to avoid those rules.  That's why dynamic rules or systems
such as Bayes and SURBL are so important.  The program and rules
themselves don't have to change;  just the data source they work with. 
Manual feedback is NECESSARY for a well-adjusted Bayes system;  without
that feedback there's no way to guarantee that it won't behave
incorrectly on your email stream.

The SA devs could, in theory, release updated rules much more
quickly...  but then they'd be spending most of their time maintaining
and creating new rules, then going through the score-balancing process
to maximize spam detection while minimizing FPs across the official
ruleset - this is a much faster process these days, but it's still a
week-long process IIRC.  (As compared to ~6 weeks up until ~2.63 IIRC.)

The most common detail in most other reports like yours (you don't say
much beyond "It's broke.  Fix it.") is that spam is hitting BAYES_99....
and nothing else.  In 2.6x, this wasn't a problem, BAYES_99 scored over
the threshold of 5 in the default setup, and spam would be correctly
tagged in that case.  With 3.x, the BAYES_nn scores have been rather
reduced, and a number of people have reported good results from just
copying the 2.64 BAYES_nn scores.

> I guess something has to change. "Then change it yourself" type of
> advices will go straight to /dev/null, thank you, because as far as
> SA is concerned, i'm just a user. I am merely pointing out the
> problem.

I'm a little puzzled what you're asking for, then;  addon rulesets are
available from SARE, and somewhere there's a tool to automatically check
for updates on those rules.  ISP mail administrators should at least be
able to whitelist/blacklist email addresses (or provide a way for users
to do so for themselves), and better ones will have a way for users to
submit missed spam or FPs back to be whitelisted/blacklisted/learned by
Bayes/manually poked for possible local rules.

The core SA development team spends more time developing the code that
dissects the message and pulls out specific parts;  with 3.x anyone can
now (more) easily add more complex "rules" that aren't "just" simple
pattern matching but do things like counting occurrences of words or
letters - or more complex checks.  Quite a few SA "rules" rely on code
like this;  that code *can't* be quickly updated in the same way that
the SARE rulesets (for instance) can.

If you're really not interested in tweaking your SA setup, look into a
mail client with its own spam filter - Netscape/Mozilla of recent
versions have one that's pretty good, Apple Mail is supposedly pretty
good, IIRC KMail has one.  But ANY spam filter needs feedback on whether
the filter is working correctly - in the case of a mail program, it's
usually a few mouse clicks compared to the regex tweaking or arcane
command line magic required for SA.

If you're not the administrator of the system running SA on your mail,
talk to the person/organization that is and complain.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!

Re: it's getting worse again

Reply via email to