On Tue Mar 23, 2010 at 08:18:01 -0700, Michael Papet wrote: > I have been slowly working on implementing a spam filter > that analyzes the title of the email and uses a Rabin-Karp > hashing algorithm to classify the email.
This seems doomed to failure, unless it is combined with other characteristics. Certainly it will catch the "XX% of pfitzer" mails which go round constantly. But consider the case of semi-generic mail subjects such as "You have a new message." I see such mails from many sites; sometimes they are legit (facebook/modelmayhem/whatever) and other times they're spam. Even combining the envelope-sender, or the IP of the sender, might make it more useful. But right now I think that simple pattern matching on incoming mail subject is going to have too many false-positives; not least because spam-senders can and do copy subjects from legitimate sources. Steve --