(on-list follow-up)

First, earlier I presented these stats:
186/500 (ivmURI hits from the latest 500 URIBL listings)
328/500 (URIBL hits from the latest 500 ivmURI listings)

A follow-up *idential* test... only conducted later... gave these stats:
225/500 (ivmURI hits from the latest 500 URIBL listings)
282/500 (URIBL hits from the latest 500 ivmURI listings)

(geocities/blogspots/etc URIs excluded from both tests)

Why the difference? Why the improvement in ivmURI? How did ivmURI *significantly* narrow that gap?

Two reasons:
(1) ivmURI's engine works faster during non-EST-business hours and weekend hours (for various reasons) ...(I'm working on ivmURI's engine right now. I've made these needed improvements with ivmSIP... now I just need to do the same with ivmURI) (2) While much of URIBL is automated, user-submissions to URIBL wane a bit when both America and Europe are experiencing non-business hours.. even non-waking hours... and weekend hours

The the reason why ivmURI does BETTER in that testing than it did several hours ago.

...but none of this matters that much... as I'll prove later... but I present this anyways "for the record"

Dallas Engelken wrote:
ivmURI stats from last 20000 URIBL reactive listings.
-> 5519 hits
-> 14481 misses
Dallas confirmed that these initial stats he posted DID include all those geocities, blogpot, and other subdomains in URIBL that ivmURI doesn't even try to catch... and there are TONS of those now in the URIBL list. So Dallas's stats here are comparing "apples to oranges". According to Dallas's off-list comments to me, when the "subdomains" are removed, the ivmURI hits on recent URIBL listings are significantly higher than these stats he original posted. Of course, I don't make it my goal in life to list every last domain in URIBL. But this would partially explain why my stats look so different from Dallas's stats... and why these stats (unfairly and artificially) made ivmURI look so bad.

ivmURI stats from last 20000 URIBL proactive listings.
-> 351 hits
-> 19649 misses
By "proactive listings", I discovered in my off-list conversation with Dallas that this refers to URIBL-Gold listings... where items are listed in "uribl-gold" in advance of seeing them in actual spams. But this uribl-gold list isn't available to the public and is not even prescribed as a list to use for fighting spam. I'm really disappointed that Dallas would have presented that kind of comparison to ivmURI. This is like comparing some kid's best basketball game on an X-Box to Michael Jordan's best basketball game on the court. I'm glad that URIBL-Gold is helping URIBL black get better... but until the listing actually makes it into URIBL-Black... and is then actually *usable* for blocking spam... it really doesn't count for anything. Therefore, such a comparison is not only unfair, it is downright laughable. (To be extra clear, in contrast to URIBL-gold, ALL the items reported on http://invaluement.com/results.txt HAVE been seen "in the wild" and I do have corresponding evidence spams "on file")

A LARGER QUESTION:

What matters more, how many items are in a list? Or (1) the amount of "real world" spam sent to *real* users (NOT dictionary attack spam sent to "unknown users") that a list "hits" on? Along with (2) low FP-rates.

At the moment:

SURBL has 1.34 MILLION listings
URIBL has 310K listings
ivmURI has 233K listings

But those numbers don't tell the whole story. ivmURI stands up quite well when measuring real world "hits" on spam sent to real users. When measured in the real world, ivmURI compares quite well in head-to-head-to-head tests against SURBL and URIBL... even with it's smaller footprint... and ivmURI is at least as good in the low-FPs department.

But, like I said, ALL three lists are indispensable and block spam that the other two miss.

Rob McEwen

Reply via email to