I've been thinking about a new rule, either for Bayes or for more normal processing, and I'd like the group's opinion. It has to do with URLs in the message.

My original thought came to me when running SpamCop on a bunch of messages. Taking a peek at the SC output I see that they whois the host IP of the URLs in the message to find the email address of the netblock owner. Running this code myself shows that the messages are coming from netblocks in China (surprise, surprise). My question is, can we feed use this information, either directly or through Bayes, to help predict whether a message is spam?

I'm sure the something like URI::URL could be used to get the host of the URLs, and that could in turn be fed to a whois server (although how one chooses between ARIN, APNIC, and the like I'm not yet sure). But I'm not enough of a Perl guy to know how to do it all, especially when it comes to tying it into the Bayes system.

I suspect that systems like this could help target a fair amount of spam, but I'm also not sure what it would do to the poor whois servers (I notice SpamCop caches whois listings itself). Anybody care to offer suggestions?

Brad



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to