On Tue Mar 23, 2010 at 08:18:01 -0700, Michael Papet wrote:

> I have been slowly working on implementing a spam filter
> that analyzes the title of the email and uses a Rabin-Karp
> hashing algorithm to classify the email.

  This seems doomed to failure, unless it is combined with
 other characteristics.

  Certainly it will catch the "XX% of pfitzer" mails which
 go round constantly.

  But consider the case of semi-generic mail subjects such
 as "You have a new message."  I see such mails from many
 sites; sometimes they are legit (facebook/modelmayhem/whatever)
 and other times they're spam.

  Even combining the envelope-sender, or the IP of the sender,
 might make it more useful.  But right now I think that
 simple pattern matching on incoming mail subject is going to
 have too many false-positives; not least because spam-senders
 can and do copy subjects from legitimate sources.

Steve
--

Reply via email to