On Thu, 25 Aug 2005 13:22:53 -0400, François Pinard wrote: >[David MacQuigg] > >> The key new features needed in a spam filter are the ability to >> extract the sender's identity (not that of the latest forwarder), and >> to factor into the spam score the reputation of that identity. > >This will only work if your system is immune to forgeries, while being >largely widespread.
Stopping forgery is what the new authentication methods are all about. Getting these methods widely and effectively used is our big challenge, and one that I hope to accomplish with my efforts. There are a bunch of pieces that need to work together more smoothly. That's where Python comes in. There are some challenging constraints, like the system has to work without government regulation. I've got a first draft of a website for open-mail.org - temporarily at http://purl.net/macquigg/email/registry Suggestions are welcome. >> In the flow we envision, the spam filter is the final process, used >> only on the 5% that is hard to classify. 80% will get an immediate >> reject. 15% will get an immediate accept without filtering, because >> the sender is authenticated and has a good reputation. Eventually, >> all reputable senders will join the 15%, and the 5% will shrink to >> where we can ignore it. > >It's fun to read statistics about a vision! :-) The 80% is real. http://messagelabs.com/emailthreats As to how the remaining 20% will split, that's a guess, but one that I think is realistic. See http://www.spamhaus.org/effective_filtering.html for comparable numbers using only IP blacklists and spam filtering. The 5% still needing filtering will be those senders that don't offer any authentication or that authenticate with an identity that has not yet acquired a reputation. >> >You might find www.spambayes.org of interest, in several ways. > >Spambayes is surprisingly good as it already stands. I haven't used Spambayes, but my experience with Spamnix (an offshoot of Spam Assassin) is that statistical filters always have a few false rejects. In my case, that's about two per week. The solution to this problem is a reliable system allowing receivers to determine the identity and reputation of an unknown sender. Then we can safely ignore the spam. -- Dave -- http://mail.python.org/mailman/listinfo/python-list