--- On Tue, 3/23/10, Steve Kemp <st...@steve.org.uk> wrote: > From: Steve Kemp <st...@steve.org.uk> > Subject: Re: Rabin-Karp Spam Filter > To: "Michael Papet" <mpa...@yahoo.com> > Cc: qpsmtpd@perl.org > Date: Tuesday, March 23, 2010, 8:27 AM > On Tue Mar 23, 2010 at 08:18:01 > -0700, Michael Papet wrote: > > > I have been slowly working on implementing a spam > filter > > that analyzes the title of the email and uses a > Rabin-Karp > > hashing algorithm to classify the email. > > This seems doomed to failure, unless it is combined > with > other characteristics. > > Certainly it will catch the "XX% of pfitzer" mails > which > go round constantly.
That's the point. It catches mail that somehow makes it through the bayesian classification systems as uncertain. I'm not trying to beat spamassassin. I'm trying to address the weaknesses. I'll figure out ways to analyze the body and some other things on messages with almost no content. Again, I don't want to improve on spamassassin. I want to address the limitations. Generally speaking, Bayesian filters seem to fail on bodies with little text in them. Is that an accurate observation? > But consider the case of semi-generic mail subjects > such > as "You have a new message." This is a more complex problem. Maybe one day I can get to that one. Thanks for the feedback. mpapet