Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain

List Mail User Sun, 19 Feb 2006 23:09:43 -0800

>...
>List Mail User wrote:
>> Huh?  (Lookup "strawman" in a dictionary, please.)
>That's my understanding of what you were claiming happened. Yes, it
>looks like an absurdly weak argument. However, it's the argument you
>presented, as best I can make sense of your posts.
>
>Or are you admitting that you made those arguments intentionally as a
>straw man to confuse the issues?
>
>> Scenario 3:)
>>      A hardware store posts a *direct* link to a rebate form (or had a
>> pad of them on or near a shelf);  Customer prints (or takes a copy of) the
>> rebate form (with no opportunity to every see the Scotts' web page telling
>> him that any email he provides will be used for marketing);  Customer buys
>> a Scotts' product at the store and mails in the printed form,  Customer then
>> receives spam.  QED
>>
>>      Please read the rebate form, then read the Scotts' privacy policy.
>>   
>
>As I read the form, and Scotts privacy policy, Scotts cannot send you
>additional marketing information by email just because you entered this
>contest. As I read it they can only send you requested mail, but it
>depends on how you interpret the sentence structure.
>
>Regardless, even if we accept your theory that their privacy policy
>allows it, it doesn't prove, or even suggest, that they did.
>
>I find your willingness to accept a twisted reading of a privacy policy
>as satisfactory proof of spamming activity rather disturbing.
>


        Hold on here - there are two different pages/forms involved.
One is for a contest - a copy still exists at:

http://www.winterizewithscotts.com/index.tbapp?page=intro

The other is a rebate form, with a copy at:

http://www.winterizewithscotts.com/index.tbapp?page=rebate_page

To get to the rebate form navigating on Scotts' own pages you must go through
a page which contains a link to a privacy policy which states that you do
consent to receive promotional materials.  But the rebate form itself does
not say any such thing and if a third party (Lowes, Home Depot, etc.) just
provided a link to the rebate form, or a consumer used a "rebate site",
they would never see the policy.  In all likelyhood the "promotional material"
is simply the mailings which you already get by choice, so you've never seen
anything "extra".

        If you look, you'll see as I did that the contest page *does* have
a link to the "privacy policy", and as such I consider any one who signed
up to have made "informed consent" (even if they are Joe Average, and didn't
know that was what they were doing).

        So once again, I'll say:  Based on the REBATE form, I believe it
is more likely than not that at least some consumers did receive unsolicited
commercial email from Scotts - i.e. spam.

--------------------------------------------------------------------------------

        Now lets leave all the arguing about specific cases behind, because
I believe you have a point regarding the interactions of the various RBLs
used in SA, not just for URIs but also for DNS_* and RCVD_* rules.

        From my reading of the FAQ and the code used in the perceptron and
to create meta-rules, there is a strong bias against creating meta-rules
after the fact (i.e. after the perceptron run) and an extremely strong
bias against negaptive scoing meta-rules (a separate issue, but related to
the exponent used in weighting the construction of such rules).

        I also believe that a better result would occur with more URI RBLs
in use (currently the SBL is the only IP based RBL used for URIs, and hence
for NS lookups).  A naive construction of all possible meta-rules will
quickly expand exponentially a huge number cases with simple "and" clauses
(and counting clause meta-rules and other type of constructions would add
even more cases).  Also, the exist several RHS and IP URIs, which have a
much higher FP rate than either any of the SURBLS (ignoring the recent 7%
[ws] report) or URIBL, but which have proven very useful to me.  Some of
these include using the AHBL for URIs (out performs the RCVD rule for me
in both spam hit and with a lower FP rate), the RFCI lists as URI rules,
(armored clothing on) the SORBS spamtrap list, and the completewhois list;
The last two having the advantage of being IP based so they catch nameservers
on IPs just as the SBL does.

        The evaluation of meta-rules is *much* less expensive than that of
the primary rules (simple expressions only to evaluate, no additional DNS
lookups or server queries required);  If we consider the 5 SURBLs (not
counting XS), 2 URIBL lists (black and grey, ignoring red), 5 RFCI lists,
completewhois, SORBS' spamtrap list, the SBL and the RAZOR "CF"/e8 test(s),
we have at least 16 rules which take URIs into account; All ot the possible
meta-rules "and"'d cases would give us millions of meta-rules, far too many.
But by guessing which ones are reasonably related we could simply contruct
the 31 rules created by the set of RFCI lists (1 choose 5 plus 2 choose 5...)
the 323 rules constructed from the SURBL, URIBL and AHBL lists and the 7 rules
created by combinations of the SBL, completewhois and SORBS rules;  If they
were initially used as inputs to the perceptron run (instead of possibly being
derived after the fact by the the evolve_metarule code - which by my reading
the documentation would never find or even examine most of these cases), you
might indeed find that the cases for multiple hits would result in much lower
scores.  Also, very likely is that many of these combinations would not have
a high enough hit rate to warrant inclusion in the set of SA rules - So I am
not proposing adding 150 new rules, but merely testing that number and then
selecting the (expected) few dozen that actually have value.  By including them
in the original perceptron test inputs, we would immediately see if the old
individual URI rules were lowered in value and if the "new" meta-rules raised
those values all the way back up.

        Add in the counting cases and we (assuming a similar grouping) add
only a few dozen more test rules to these three sets of URI rule groups and
we would only have another 15 rules (5 + 7 + 3).  Using both "equal" and the
case for "greater than or equal" give 12 more cases (you can't be greater
than the maximum).  If we also form the possible meta-rules from the cross
product of all the different group "counting" rules, we get 1050 more rules
to test (10 * 15 * 7 - the zero cases should also be included).  All told now
we have ~1500 new rules to test, but only expect a few dozen to have useful
values.

        Also, performing a similar set of meta-rule constructions for the
DNS_* and RCVD_* rules would give a coule of thousand more test cases and most
likely result in another hundred or so rules of significant value.

        Also, contructing "cross" type rules after the fact in the evolve code
now becomes possible and maybe even likely (e.g. URI in > 2 RHS RBLs, and DNS_
in > 1 IP RBL and RCVD_'d in > 2 RBLs - or similar cases);  These cases might
also result is a few dozen more useful rules.

        Yet another class of potentially profitable meta-rules could be
constructed by the intersection of RCVD_* rules and digest rules (if
it comes from a DUL, the XBL, DSBL, the AHBL or SpamCop and is bulk...).

        Of course this means that the scoring runs will take far more time
and resources than they do today, but the actual run time of operating SA
would barely change at all.

        I strongly suspect this would lead overall to close to a few hundred
new meta-rules, but almost all (the meta-rules and the existing "direct" ones)
having a much lower score than most of the current existing net tests are
assigned *and* your concerns about overlapping FPs would be directly addressed.

        Clearly this is not hard - all the cases I have listed could easily
be generated exhaustively be a simple script (yes, I'm volunteering if any
developer who can run the tests on a large corpus asks).

        I strongly believe that this would result in a much higher number
of rules overall, with a much lower average score and a much lower FP rate.

        As always, net tests do almost nothing for those who get spam at the
start of a spam run, and the "classic" rules and the work of SARE continues
to be the only way to stop much of that, but adding URI tests for both SORBS
spamtraps and completewhois *will* stop some cases of new spam by catching
IPs which have previously been used, but are currently being spammed with new
domains or are using known "evil" nameserver IPs (as it is, only the SBL can
catch those cases in the default SA distribution today).

        Paul Shupak
        [EMAIL PROTECTED]

Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain

Reply via email to