Re: Over-scoring of SURBL lists...

Matt Kettler Sun, 19 Feb 2006 16:47:41 -0800

Theo Van Dinter wrote:
> On Sun, Feb 19, 2006 at 02:20:05AM -0500, Matt Kettler wrote:
>   
>>>> How can we keep the spam tagged, and try to mitigate the FPs by keeping
>>>> additive scores for multiple URIBLs more moderate? +20 worth of URIBL
>>>> hits is fine on spam, but astronomically high scores don't really help
>>>> SA when the tagging threshold is +5. However, they do hurt SA when
>>>> overlapping mistakes happen.
>>>>         
>> Yes.. which is exactly who I was primarily trying to reach by posting
>> here on the spamassassin, before this turned into a large
>> misunderstanding between the URIBL operators and myself.
>>     
>
> I have two things related to this:
>
> 1- if the lists are indeed separate (ie: different sources, etc,)
>    then having multiple rules makes sense.
>   
They're about 95% separate.. They're all separately maintained, and have
a lot of different approaches to making sure a listing is valid.
I don't think there's any direct cross-feeds where one spamtrap operator
feeds their trap data multiple lists.


 However, there's some potential for duplicate input because of the
end-user-reporting.

Take Joe user, who gets a message he considers spam. He runs
spamassassin -r on it, reporting the message to spamcop, and Razor (e8
is uri based, so relevant here. Pyzor, and DCC will also be reported,
but less relevant). The Spamcop report would require multiple reports,
but if it happens that feeds into SC and AB, which then re-check
theURIs. He then pulls out a few URIs, and manualy reports them to
URIBL. He then goes to rulesemporium.com and reports it to WS. If he's
got an outblaze account, he could also report to OB.

All of the above have differing degrees of check to make sure the link
isn't a false report.

So you need to have multiple failures occur in order for FPs to happen.
But  I found two examples in a search of 100 nonspam emails at work an
218 at home. Admittedly these were examples on separate sites, and with
two lists which are generaly high-fp for me, but it shows that failures
can cascade.

While this is an "extreme" case, most of these lists have user reports
as a small percent of their total input, it does show how the same
message can have some

That's why I'm suggesting we consider a base+offset approach to surbl.
It allows each list to be scored independently, but also allows the
perceptron to allocate scores that reflect the overlap.

>    related to this, I mentioned earlier in the thread about a bug I found
>    in the reuse section of mass-check while generating some statistics.
>    we used the reuse code to generate the 3.1 scores.  however, due
>    to the bug, rule hits were lost.  so it's hard to say exactly what
>    occured because of it, but the scores generated for network tests
>    (those that enabled reuse anyway) are almost definitely miscalculated,
>    and potentially very miscalculated (see the same previous post about
>    the "way different" SURBL WS rule hits that I found).
>   
Yeah, that's bad.. What surprises me is the actual magnitude of the
results. My own experience is that WS and OB both have FP problems, but
they're on about the same level. URIBL_BLACK has at least 10x more FPs
than all the surbl hosted lists combined, including WS... But you guys
see less.
>
> We're trying to get updates going for 3.1, and I'm hoping to get scores
> generated more frequently after that's setup.  Perhaps the next set of
> scores will address your issue more directly? 
Possibly.
>  Is the problem more that in the past there weren't a large number of FPs and 
> now there are?
>   
In the past FPs were rare and always confined to one list.  In the past
6 months I've seen a dramatic increase in FPs from WS, OB and BLACK.

Re: Over-scoring of SURBL lists...

Reply via email to