Re: [SAtalk] Perspectives on (not) using SA

Bob Proulx Wed, 30 Oct 2002 10:08:40 -0800

> > Did you read the original article?  He claims to be _more_ accurate than
> > SA while still doing header-content-only tests (not DNSbl).  Of course, I
> > don't know whether that includes blocking IP ranges with a private list.


I have seen a lot of claims that filter brand X is accurate at a some
insanely good five nines accuracy level.  But strangely whenever I try
it on my own mailbox it never works as well as for me as the original
author claims it to work for them.  I ask the rhetorical, "Why is
that?"

The problem is they train on historical data.  If I train on my spam
box with almost any filtering scheme, including hand written procmail
rules, then I can almost always approach 100% accuracy.  But as new
spam comes in I have to either retrain or rewrite the procmail rules.
And if I gave those rules to you they would certainly not work as well
for you as they work for me.  No single person is a representative
spam sample of all email for all people.

This problem is similar to stock market prediction programs.  And any
other random noise generator.  I can curve fit any historical data
100%.  But the real test is how well filter programs work as new
trends mutate and become different than historically.  This is the
test that separates good spam filters from bad.

I used to write all of my own spam filtering tests.  I still have a
few since I can improve on the SA rules *for my particular spam*.  I
still use RBLs extensively.  I have been pretty agressive at filtering
in general.  But all of that takes a lot of continuous effort to keep
up to date as spam trends change.

Personally spamassassin has been working really well at accurately
tagging spam in my box and only very rarely has a false positive and
the effort needed by me to maintain this is zero.  Spamassassin just
works.

Bob

msg09708/pgp00000.pgp
Description: PGP signature

Re: [SAtalk] Perspectives on (not) using SA

Reply via email to