At 7/9/02 6:36 PM, Mark Reynolds wrote:

>Hi Robert and Justin,
>
>I've documented the original idea (well, saved some emails :-)
>at http://bl.reynolds.net.au/ksi/
>
>I've been focusing on learning who to run a dnsbl service,
>scanning, and integrating it all together. Current system
>was spread over 4 servers, so I'm also merging it all onto
>one.
>
>http://bl.reynolds.net.au/
>
>So I think I've got that sussed now, and will be moving onto
>the ksi project in the next few months.

Sounds excellent ("Key Spam Indicators" is a good name, too -- I couldn't 
think of one). I'd appreciate it if you could post progress reports so I 
(and others) could help out where it would be useful. (I could also 
provide some US bandwidth and a domain name.)

I'd definitely recommend including phone numbers as well as URLs and 
e-mail addresses. I have a manual content blocking list that I maintain 
for egregious spammers, and I've found that the most spam is blocked by 
URLs, followed by phone numbers, then e-mail addresses. I think this is 
just because of the relative permanency of these: a spammer who has 
bought a domain name or a phone number can't ditch it as quickly as he 
can sign up for a new free e-mail address.

The main problem with phone numbers in my scheme is that spammers 
disguise the format by writing idiotic things like "88 8 - 729 89 76", 
which makes it harder for my current postfix content filter to catch 'em 
unless I write CPU-intensive rules for each phone number. Of course, 
smarter code would strip out the extraneous characters before trying an 
lookup.

And as I mentioned offlist to Jason, I think the hardest part of this 
system is automating the submission process. The trouble is that spammers 
do sometimes include other people's e-mail addresses and so forth in 
their spam -- for example, I get plenty of spam saying "your site 
www.tigertech.net is not listed in search engines!" -- and an automated 
process would presumably tag that. Obviously, people submitting spam 
could be presented with a list of contact info and then uncheck any that 
isn't related to the spammer, like SpamCop, but people frequently screw 
up the SpamCop reports (I've accidentally reported myself a couple of 
times). In addition, it would be nice to accept reports based on a 
spamassassin -r report, which isn't interactive (and also to parse input 
from spamtraps, NANAE feeds, and so forth).

Probably the solution there is to simply not list an indicator unless 
it's been reported by multiple people, as you suggested, and to increase 
the resulting weight as it gets reported by more and more people. 
Obviously, then, low rankings should be taken with a grain of salt. It 
might be a good idea to run two separate RBL facilities -- one of which 
returns a weight (for clients smart enough to deal with that), and one of 
which just returns a yes/no answer based on whether the weight exceeds a 
certain level for less intelligent clients.

I also agree that just removing anyone who asks is a good idea; I doubt 
it would become a problem.

Finally, an observation about:

>- trim off any text after "?" or "&" (to avoid URLs like
>      http://foo.com/?49435 and http://foo.com/?9438 being treated as
>      differing, when they are not)

Sometimes the part on the end is an affiliate ID, and only one affiliate 
is spamming, so it might be useful to retain in some cases. (On the other 
hand, if one affiliate is spamming, it's probably some kind of shady 
scheme that will just encourage other idiots to do it as well, so perhaps 
it doesn't matter so much.)

------------------------------------
Robert L Mathews, Tiger Technologies



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Two, two, TWO treats in one.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to