Re: Advice for a weekend spam assassin?

jdow Fri, 10 Jun 2005 10:57:29 -0700

1) You need to visit http://www.rulesemporium.com/ and select at least
   a few of the SARE rules sets. They do really help SA performance.
2) I found best results here if I bucked up the BAYES_99 rule to 5
   points. So far I have not seen that trigger a ham message with per
   user Bayes. That per user Bayes is important. Shared Bayes is not
   nearly as effective and should be banned in Boston - and the rest
   of the world, too. It's a copout. Users MUST be prepared to help
   by training their personal filters. Otherwise they must accept
   increased spam escapes.
3) 3.0.4 is out. It installs nicely. (But give it a lot of time for
   some of its tests. My first shot at a CPAN install I thought it
   had died or locked up on a couple tests.)
4) 5 is a good threshold. NEVER discard messages marked as spam unless
   you do this at a rather high markup level. (SARE rules help make THAT
   happen.) A subject markup that includes the spam score is handy for
   the users. (I use a three digit markup since I have seen really nasty
   messages rack up 100 point scores here - on small score rules.) Then
   the user can feed *** SPAM(099) *** messages into a spam folder by
   sorting on the "*** SPAM" part. They should review the contents before
   discarding. Sort the mailbox alphabetically and look at the low scores
   briefly - a minute suffices for me even when I see something peculiar
   I want to make sure is already properly Bayesed. (You can verb ANY
   noun. {^_-})
5) For children's accounts modify the procedure so that their parent can
   vet the mail and drop any false markups into their children's folders.
   If the parents take a little extra time they can take the false markup
   message and extract the real message attachment to put in the child's
   mailbox. That part is up to them.
6) Do NOT use autolearn or autowhitelist. The idea is intriguing but I
   see too many busted Bayes databases from those abuse tools. I manual
   train rather seldom. About every 6 months I remember to run some
   random batches of ham though the ham training. Every time I see a
   very low score spam (or an escaped spam) with low Bayes I train on
   those messages. Otherwise I just let it perk along doing its thing.
   I do use wetware Bayes phrase filtering better known as the SARE
   rule sets and update them periodically.


Practical results:
   About 1 escaped spam a day out of 300+ spams.
   About 2 mismarks a day chiefly from the Linux Kernel Mailing List.
       (Patch sets and bug reports with dumps confuse the SARE rules.)
       (And sometimes AOL mails come through mismarked because they
       yet again screwed up their server configuration.)
   Specifically: Yesterday out of 700+ messages I had no escaped spam
       and 3 mismarked LKML spams. In the last 9 hours I've already
       received one Mexican language spam get through. That may be my
       escaped spam for the day or I might get another. No ham has been
       mismarked.

{^_^}   Thus be Joanne's configuration du jour. By the way, I use some
        43 of the SARE and other rule sets. I go a trifle overboard,
        methinks. It's a dangerous job but somebody has t0 do it -
        Super Chicken.
----- Original Message ----- 
From: "James Bucanek" <[EMAIL PROTECTED]>
To: <users@spamassassin.apache.org>
Sent: 2005 June, 10, Friday 08:06
Subject: Advice for a weekend spam assassin?


Greetings,

I consider myself a "weekend" spam assassin.  I run my own server
(co-located), and have about a dozen users (mostly friends and family, but a
few paying customers).  But running a mail server isn't my day job.  I don't
run Razor or any of the cooperative spam filters simply because I didn't
have the time to figure them out and set them up.

I'm running Spamassassin 3.0.2 which I installed a few months ago.

SA is still only catching about 50-75% of the spam.  I've set up Bayes learn
ham/spam mailboxes, and I regularly feed them 200 to 500 messages a day.
Yet even after months of training, I still get messages like this:

    Subject: (6/10/05) Mortgage Rate Report
    X-Spam-Status: No, score=3.6 required=7.0 tests=BAYES_99,HTML_80_90,
        HTML_FONT_TINY,HTML_IMAGE_RATIO_04,HTML_MESSAGE,NORMAL_HTTP_TO_IP,
        OPTING_OUT autolearn=no version=3.0.2

As you can see, the Bayes filter has nailed it as spam, but it still only
gets a score of 3.6.

I currently have my threshold set to 7.0.  I've been considering lowering it
again (maybe to 5.0), but am paranoid about false positives.  I can go
through my mailbox and see ham that has scores of 3 or even 4.

I was hoping that someone here could give me some quick advice as to what I
might be doing wrong, or point me to a trouble-shooting site for SA.

I was previously using a client-side Bayes filtering system and was getting
99.8+% spam identification rates.  SA has been, so far, a bit of a
disappointment and I'm sure it's my fault.  :)

-- 
James Bucanek <mailto:[EMAIL PROTECTED]>

Re: Advice for a weekend spam assassin?

Reply via email to