Re: Over-scoring of SURBL lists...

Jeff Chan Fri, 17 Feb 2006 18:57:04 -0800

On Friday, February 17, 2006, 5:34:57 PM, Matthew Eerde wrote:
> It's not particularly important how many URLs the lists have in
> common.  What is important is how many *false positives* the
> lists have in common... or more to the point, whether a given "good" URL 
> is more likely to be on (say) JP given that it's on (say) SC.


> If
>         P(JP | SC ^ good) >> P(JP | good)

> then your point is accurate, and the rules should be re-scored.

> Otherwise the rules are fine.

The FP rates on SC and JP are consistently very low across the
scores that have been posted by many people.  OB and WS have
higher FP rates, but they don't tend to have a lot of FPs in
common.  We review our data for FPs every day.

Unless someone can provide some examples of FPs on multiple SURBL
lists, I think Matt K. may be worried about a situation that
doesn't happen very often.

We are constantly looking for FPs and FNs and for ways to improve
performance.  We obsess over FPs as much as anyone, and frankly
we don't find many.  If anyone does spot any FPs, I sure wish
they would report them to us: 

  whitelist at surbl. org

Instead of making accusations that seem vague and groundless,
please give us feedback on any FPs!

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Re: Over-scoring of SURBL lists...

Reply via email to