Re: Starting a URIBL - Howto? [OT]

Rob McEwen Tue, 29 Apr 2008 18:26:18 -0700

(on-list follow-up)

First, earlier I presented these stats:
186/500 (ivmURI hits from the latest 500 URIBL listings)
328/500 (URIBL hits from the latest 500 ivmURI listings)


A follow-up *idential* test... only conducted later... gave these stats:
225/500 (ivmURI hits from the latest 500 URIBL listings)
282/500 (URIBL hits from the latest 500 ivmURI listings)

(geocities/blogspots/etc URIs excluded from both tests)

Why the difference? Why the improvement in ivmURI? How did ivmURI*significantly* narrow that gap?


Two reasons:

(1) ivmURI's engine works faster during non-EST-business hours andweekend hours (for various reasons) ...(I'm working on ivmURI's engineright now. I've made these needed improvements with ivmSIP... now I justneed to do the same with ivmURI)(2) While much of URIBL is automated, user-submissions to URIBL wane abit when both America and Europe are experiencing non-business hours..even non-waking hours... and weekend hours

The the reason why ivmURI does BETTER in that testing than it didseveral hours ago.

...but none of this matters that much... as I'll prove later... but Ipresent this anyways "for the record"


Dallas Engelken wrote:

ivmURI stats from last 20000 URIBL reactive listings.
-> 5519 hits
-> 14481 misses

Dallas confirmed that these initial stats he posted DID include allthose geocities, blogpot, and other subdomains in URIBL that ivmURIdoesn't even try to catch... and there are TONS of those now in theURIBL list. So Dallas's stats here are comparing "apples to oranges".According to Dallas's off-list comments to me, when the "subdomains" areremoved, the ivmURI hits on recent URIBL listings are significantlyhigher than these stats he original posted. Of course, I don't make itmy goal in life to list every last domain in URIBL. But this wouldpartially explain why my stats look so different from Dallas's stats...and why these stats (unfairly and artificially) made ivmURI look so bad.

ivmURI stats from last 20000 URIBL proactive listings.
-> 351 hits
-> 19649 misses

By "proactive listings", I discovered in my off-list conversation withDallas that this refers to URIBL-Gold listings... where items are listedin "uribl-gold" in advance of seeing them in actual spams. But thisuribl-gold list isn't available to the public and is not even prescribedas a list to use for fighting spam. I'm really disappointed that Dallaswould have presented that kind of comparison to ivmURI. This is likecomparing some kid's best basketball game on an X-Box to MichaelJordan's best basketball game on the court. I'm glad that URIBL-Gold ishelping URIBL black get better... but until the listing actually makesit into URIBL-Black... and is then actually *usable* for blockingspam... it really doesn't count for anything. Therefore, such acomparison is not only unfair, it is downright laughable. (To be extraclear, in contrast to URIBL-gold, ALL the items reported onhttp://invaluement.com/results.txt HAVE been seen "in the wild" and I dohave corresponding evidence spams "on file")


A LARGER QUESTION:

What matters more, how many items are in a list? Or (1) the amount of"real world" spam sent to *real* users (NOT dictionary attack spam sentto "unknown users") that a list "hits" on? Along with (2) low FP-rates.


At the moment:

SURBL has 1.34 MILLION listings
URIBL has 310K listings
ivmURI has 233K listings

But those numbers don't tell the whole story. ivmURI stands up quitewell when measuring real world "hits" on spam sent to real users. Whenmeasured in the real world, ivmURI compares quite well inhead-to-head-to-head tests against SURBL and URIBL... even with it'ssmaller footprint... and ivmURI is at least as good in the low-FPsdepartment.

But, like I said, ALL three lists are indispensable and block spam thatthe other two miss.


Rob McEwen

Re: Starting a URIBL - Howto? [OT]

Reply via email to