At 7/9/02 6:36 PM, Mark Reynolds wrote: >Hi Robert and Justin, > >I've documented the original idea (well, saved some emails :-) >at http://bl.reynolds.net.au/ksi/ > >I've been focusing on learning who to run a dnsbl service, >scanning, and integrating it all together. Current system >was spread over 4 servers, so I'm also merging it all onto >one. > >http://bl.reynolds.net.au/ > >So I think I've got that sussed now, and will be moving onto >the ksi project in the next few months.
Sounds excellent ("Key Spam Indicators" is a good name, too -- I couldn't think of one). I'd appreciate it if you could post progress reports so I (and others) could help out where it would be useful. (I could also provide some US bandwidth and a domain name.) I'd definitely recommend including phone numbers as well as URLs and e-mail addresses. I have a manual content blocking list that I maintain for egregious spammers, and I've found that the most spam is blocked by URLs, followed by phone numbers, then e-mail addresses. I think this is just because of the relative permanency of these: a spammer who has bought a domain name or a phone number can't ditch it as quickly as he can sign up for a new free e-mail address. The main problem with phone numbers in my scheme is that spammers disguise the format by writing idiotic things like "88 8 - 729 89 76", which makes it harder for my current postfix content filter to catch 'em unless I write CPU-intensive rules for each phone number. Of course, smarter code would strip out the extraneous characters before trying an lookup. And as I mentioned offlist to Jason, I think the hardest part of this system is automating the submission process. The trouble is that spammers do sometimes include other people's e-mail addresses and so forth in their spam -- for example, I get plenty of spam saying "your site www.tigertech.net is not listed in search engines!" -- and an automated process would presumably tag that. Obviously, people submitting spam could be presented with a list of contact info and then uncheck any that isn't related to the spammer, like SpamCop, but people frequently screw up the SpamCop reports (I've accidentally reported myself a couple of times). In addition, it would be nice to accept reports based on a spamassassin -r report, which isn't interactive (and also to parse input from spamtraps, NANAE feeds, and so forth). Probably the solution there is to simply not list an indicator unless it's been reported by multiple people, as you suggested, and to increase the resulting weight as it gets reported by more and more people. Obviously, then, low rankings should be taken with a grain of salt. It might be a good idea to run two separate RBL facilities -- one of which returns a weight (for clients smart enough to deal with that), and one of which just returns a yes/no answer based on whether the weight exceeds a certain level for less intelligent clients. I also agree that just removing anyone who asks is a good idea; I doubt it would become a problem. Finally, an observation about: >- trim off any text after "?" or "&" (to avoid URLs like > http://foo.com/?49435 and http://foo.com/?9438 being treated as > differing, when they are not) Sometimes the part on the end is an affiliate ID, and only one affiliate is spamming, so it might be useful to retain in some cases. (On the other hand, if one affiliate is spamming, it's probably some kind of shady scheme that will just encourage other idiots to do it as well, so perhaps it doesn't matter so much.) ------------------------------------ Robert L Mathews, Tiger Technologies ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Two, two, TWO treats in one. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk