> > Did you read the original article? He claims to be _more_ accurate than > > SA while still doing header-content-only tests (not DNSbl). Of course, I > > don't know whether that includes blocking IP ranges with a private list.
I have seen a lot of claims that filter brand X is accurate at a some insanely good five nines accuracy level. But strangely whenever I try it on my own mailbox it never works as well as for me as the original author claims it to work for them. I ask the rhetorical, "Why is that?" The problem is they train on historical data. If I train on my spam box with almost any filtering scheme, including hand written procmail rules, then I can almost always approach 100% accuracy. But as new spam comes in I have to either retrain or rewrite the procmail rules. And if I gave those rules to you they would certainly not work as well for you as they work for me. No single person is a representative spam sample of all email for all people. This problem is similar to stock market prediction programs. And any other random noise generator. I can curve fit any historical data 100%. But the real test is how well filter programs work as new trends mutate and become different than historically. This is the test that separates good spam filters from bad. I used to write all of my own spam filtering tests. I still have a few since I can improve on the SA rules *for my particular spam*. I still use RBLs extensively. I have been pretty agressive at filtering in general. But all of that takes a lot of continuous effort to keep up to date as spam trends change. Personally spamassassin has been working really well at accurately tagging spam in my box and only very rarely has a false positive and the effort needed by me to maintain this is zero. Spamassassin just works. Bob
msg09708/pgp00000.pgp
Description: PGP signature