Robert L Mathews said:
> I'd definitely recommend including phone numbers as well as URLs and > e-mail addresses. I have a manual content blocking list that I maintain > for egregious spammers, and I've found that the most spam is blocked by > URLs, followed by phone numbers, then e-mail addresses. I think this is > just because of the relative permanency of these: a spammer who has > bought a domain name or a phone number can't ditch it as quickly as he > can sign up for a new free e-mail address. that was my gut feeling, it's handy to have stats to back that up. > The main problem with phone numbers in my scheme is that spammers > disguise the format by writing idiotic things like "88 8 - 729 89 76", > which makes it harder for my current postfix content filter to catch 'em > unless I write CPU-intensive rules for each phone number. Of course, > smarter code would strip out the extraneous characters before trying an > lookup. I reckon as long as the KSIbl just maps the phone numbers, URLs etc. in their "native", undisguised format, then we can make the code as smart as necessary to catch any obscure obfuscations. And SpamAssassin, being written in perl, can do a lot of tricks to get around those obfuscations ;) > And as I mentioned offlist to Jason, I think the hardest part of this *ahem* -- Justin ;) > system is automating the submission process. The trouble is that spammers > do sometimes include other people's e-mail addresses and so forth in > their spam -- for example, I get plenty of spam saying "your site > www.tigertech.net is not listed in search engines!" -- and an automated > process would presumably tag that. interesting point, hadn't thought of that. > Probably the solution there is to simply not list an indicator unless > it's been reported by multiple people, as you suggested, and to increase > the resulting weight as it gets reported by more and more people. >... > I also agree that just removing anyone who asks is a good idea; I doubt > it would become a problem. If this was implemented as a server-side whitelist, then that also helps with the "other people's pointers" problem; if an incorrect pointer is found, you can request the KSI never blacklist that pointer again, then the problem is solved. (ish. ;) > >- trim off any text after "?" or "&" (to avoid URLs like > > http://foo.com/?49435 and http://foo.com/?9438 being treated as > > differing, when they are not) > > Sometimes the part on the end is an affiliate ID, and only one affiliate > is spamming, so it might be useful to retain in some cases. It would be worth researching some stats on this, based on a large enough corpus... but it can wait ;) --j. ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek PC Mods, Computing goodies, cases & more http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk