1) You need to visit http://www.rulesemporium.com/ and select at least a few of the SARE rules sets. They do really help SA performance. 2) I found best results here if I bucked up the BAYES_99 rule to 5 points. So far I have not seen that trigger a ham message with per user Bayes. That per user Bayes is important. Shared Bayes is not nearly as effective and should be banned in Boston - and the rest of the world, too. It's a copout. Users MUST be prepared to help by training their personal filters. Otherwise they must accept increased spam escapes. 3) 3.0.4 is out. It installs nicely. (But give it a lot of time for some of its tests. My first shot at a CPAN install I thought it had died or locked up on a couple tests.) 4) 5 is a good threshold. NEVER discard messages marked as spam unless you do this at a rather high markup level. (SARE rules help make THAT happen.) A subject markup that includes the spam score is handy for the users. (I use a three digit markup since I have seen really nasty messages rack up 100 point scores here - on small score rules.) Then the user can feed *** SPAM(099) *** messages into a spam folder by sorting on the "*** SPAM" part. They should review the contents before discarding. Sort the mailbox alphabetically and look at the low scores briefly - a minute suffices for me even when I see something peculiar I want to make sure is already properly Bayesed. (You can verb ANY noun. {^_-}) 5) For children's accounts modify the procedure so that their parent can vet the mail and drop any false markups into their children's folders. If the parents take a little extra time they can take the false markup message and extract the real message attachment to put in the child's mailbox. That part is up to them. 6) Do NOT use autolearn or autowhitelist. The idea is intriguing but I see too many busted Bayes databases from those abuse tools. I manual train rather seldom. About every 6 months I remember to run some random batches of ham though the ham training. Every time I see a very low score spam (or an escaped spam) with low Bayes I train on those messages. Otherwise I just let it perk along doing its thing. I do use wetware Bayes phrase filtering better known as the SARE rule sets and update them periodically.
Practical results: About 1 escaped spam a day out of 300+ spams. About 2 mismarks a day chiefly from the Linux Kernel Mailing List. (Patch sets and bug reports with dumps confuse the SARE rules.) (And sometimes AOL mails come through mismarked because they yet again screwed up their server configuration.) Specifically: Yesterday out of 700+ messages I had no escaped spam and 3 mismarked LKML spams. In the last 9 hours I've already received one Mexican language spam get through. That may be my escaped spam for the day or I might get another. No ham has been mismarked. {^_^} Thus be Joanne's configuration du jour. By the way, I use some 43 of the SARE and other rule sets. I go a trifle overboard, methinks. It's a dangerous job but somebody has t0 do it - Super Chicken. ----- Original Message ----- From: "James Bucanek" <[EMAIL PROTECTED]> To: <users@spamassassin.apache.org> Sent: 2005 June, 10, Friday 08:06 Subject: Advice for a weekend spam assassin? Greetings, I consider myself a "weekend" spam assassin. I run my own server (co-located), and have about a dozen users (mostly friends and family, but a few paying customers). But running a mail server isn't my day job. I don't run Razor or any of the cooperative spam filters simply because I didn't have the time to figure them out and set them up. I'm running Spamassassin 3.0.2 which I installed a few months ago. SA is still only catching about 50-75% of the spam. I've set up Bayes learn ham/spam mailboxes, and I regularly feed them 200 to 500 messages a day. Yet even after months of training, I still get messages like this: Subject: (6/10/05) Mortgage Rate Report X-Spam-Status: No, score=3.6 required=7.0 tests=BAYES_99,HTML_80_90, HTML_FONT_TINY,HTML_IMAGE_RATIO_04,HTML_MESSAGE,NORMAL_HTTP_TO_IP, OPTING_OUT autolearn=no version=3.0.2 As you can see, the Bayes filter has nailed it as spam, but it still only gets a score of 3.6. I currently have my threshold set to 7.0. I've been considering lowering it again (maybe to 5.0), but am paranoid about false positives. I can go through my mailbox and see ham that has scores of 3 or even 4. I was hoping that someone here could give me some quick advice as to what I might be doing wrong, or point me to a trouble-shooting site for SA. I was previously using a client-side Bayes filtering system and was getting 99.8+% spam identification rates. SA has been, so far, a bit of a disappointment and I'm sure it's my fault. :) -- James Bucanek <mailto:[EMAIL PROTECTED]>