(on-list follow-up)
First, earlier I presented these stats:
186/500 (ivmURI hits from the latest 500 URIBL listings)
328/500 (URIBL hits from the latest 500 ivmURI listings)
A follow-up *idential* test... only conducted later... gave these stats:
225/500 (ivmURI hits from the latest 500 URIBL listings)
282/500 (URIBL hits from the latest 500 ivmURI listings)
(geocities/blogspots/etc URIs excluded from both tests)
Why the difference? Why the improvement in ivmURI? How did ivmURI
*significantly* narrow that gap?
Two reasons:
(1) ivmURI's engine works faster during non-EST-business hours and
weekend hours (for various reasons) ...(I'm working on ivmURI's engine
right now. I've made these needed improvements with ivmSIP... now I just
need to do the same with ivmURI)
(2) While much of URIBL is automated, user-submissions to URIBL wane a
bit when both America and Europe are experiencing non-business hours..
even non-waking hours... and weekend hours
The the reason why ivmURI does BETTER in that testing than it did
several hours ago.
...but none of this matters that much... as I'll prove later... but I
present this anyways "for the record"
Dallas Engelken wrote:
ivmURI stats from last 20000 URIBL reactive listings.
-> 5519 hits
-> 14481 misses
Dallas confirmed that these initial stats he posted DID include all
those geocities, blogpot, and other subdomains in URIBL that ivmURI
doesn't even try to catch... and there are TONS of those now in the
URIBL list. So Dallas's stats here are comparing "apples to oranges".
According to Dallas's off-list comments to me, when the "subdomains" are
removed, the ivmURI hits on recent URIBL listings are significantly
higher than these stats he original posted. Of course, I don't make it
my goal in life to list every last domain in URIBL. But this would
partially explain why my stats look so different from Dallas's stats...
and why these stats (unfairly and artificially) made ivmURI look so bad.
ivmURI stats from last 20000 URIBL proactive listings.
-> 351 hits
-> 19649 misses
By "proactive listings", I discovered in my off-list conversation with
Dallas that this refers to URIBL-Gold listings... where items are listed
in "uribl-gold" in advance of seeing them in actual spams. But this
uribl-gold list isn't available to the public and is not even prescribed
as a list to use for fighting spam. I'm really disappointed that Dallas
would have presented that kind of comparison to ivmURI. This is like
comparing some kid's best basketball game on an X-Box to Michael
Jordan's best basketball game on the court. I'm glad that URIBL-Gold is
helping URIBL black get better... but until the listing actually makes
it into URIBL-Black... and is then actually *usable* for blocking
spam... it really doesn't count for anything. Therefore, such a
comparison is not only unfair, it is downright laughable. (To be extra
clear, in contrast to URIBL-gold, ALL the items reported on
http://invaluement.com/results.txt HAVE been seen "in the wild" and I do
have corresponding evidence spams "on file")
A LARGER QUESTION:
What matters more, how many items are in a list? Or (1) the amount of
"real world" spam sent to *real* users (NOT dictionary attack spam sent
to "unknown users") that a list "hits" on? Along with (2) low FP-rates.
At the moment:
SURBL has 1.34 MILLION listings
URIBL has 310K listings
ivmURI has 233K listings
But those numbers don't tell the whole story. ivmURI stands up quite
well when measuring real world "hits" on spam sent to real users. When
measured in the real world, ivmURI compares quite well in
head-to-head-to-head tests against SURBL and URIBL... even with it's
smaller footprint... and ivmURI is at least as good in the low-FPs
department.
But, like I said, ALL three lists are indispensable and block spam that
the other two miss.
Rob McEwen