On Wednesday, February 15, 2006, 7:00:33 PM, Matt Kettler wrote: > 2) diversity of criteria: > SURBL - all lists have nearly identical listing criteria, except ph. All but > PH > are "spotted in spam, doesn't appear to have legit use" and nothing more. JP, > AB, SC, WS and OB are all effectively the same list with different input > points.
The various SURBL lists and URIBL.com may have similar listing criteria, but their original data sources and processing technologies (think listing rules/logic) are mostly very different. That they happen to notice some of the same domains could be taken as independent verification of spammyness. If so I think there is a value in having the scores add as they do. Perhaps a person to address this issue is someone like Henry Stern whose Perceptron system is used to generate the scores. OTOH you may have a valid point that most of the other non-URIBL SA rules are mostly unrelated, whereas the URIBL rules are all about the same topic: inclusion in URIBL lists. As such, perhaps the score generation system should not treat them like the other mostly unrelated rules. OTOOH the Perceptron scoring system is literally results-driven at least over the test corpora, and it's often hard to argue with results. > DNSBLs - lists have wildly different listing criteria. Some are identical to > each other, but there are 4 different criteria in the top 5. And wildly different FP rates. It's not too surprising that some are scored quite low, while others like XBL are scored relatively high. It's probably only the low scores that make most of them slightly useful, unlike lists such as XBL, which is highly useful. BTW, if you or anyone finds any FPs on SURBLs, *****please***** report them to whitelist at surbl dot org. We really need everyone's help with this. If you use our data please help improve our community with your feedback! If we could make one condition of use, that would be it. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/