Re: Rabin-Karp Spam Filter

Michael Papet Tue, 23 Mar 2010 09:37:03 -0700

--- On Tue, 3/23/10, Steve Kemp <st...@steve.org.uk> wrote:

> From: Steve Kemp <st...@steve.org.uk>
> Subject: Re: Rabin-Karp Spam Filter
> To: "Michael Papet" <mpa...@yahoo.com>
> Cc: qpsmtpd@perl.org
> Date: Tuesday, March 23, 2010, 8:27 AM
> On Tue Mar 23, 2010 at 08:18:01
> -0700, Michael Papet wrote:
> 
> > I have been slowly working on implementing a spam
> filter
> > that analyzes the title of the email and uses a
> Rabin-Karp
> > hashing algorithm to classify the email.
> 
>   This seems doomed to failure, unless it is combined
> with
>  other characteristics.
> 
>   Certainly it will catch the "XX% of pfitzer" mails
> which
>  go round constantly.


That's the point.  It catches mail that somehow makes it through the bayesian 
classification systems as uncertain.  I'm not trying to beat spamassassin.  I'm 
trying to address the weaknesses.

I'll figure out ways to analyze the body and some other things on messages with 
almost no content.  Again, I don't want to improve on spamassassin.  I want to 
address the limitations.  Generally speaking, Bayesian filters seem to fail on 
bodies with little text in them.  Is that an accurate observation?

>   But consider the case of semi-generic mail subjects
> such
>  as "You have a new message."  
This is a more complex problem.  Maybe one day I can get to that one.

Thanks for the feedback.

mpapet

Re: Rabin-Karp Spam Filter

Reply via email to