Theo Van Dinter wrote:
On Tue, Sep 19, 2006 at 10:58:46PM +0200, mouss wrote:
URI_NOVOWEL fires with things like href="#id" where id is a string that starts with 7 "no-vowel" chars.

uri URI_NOVOWEL             m%^https?://[^/?]*[bcdfghjklmnpqrstvwxz]{7}%i
uri URI_NOVOWEL             m%^https?://[^/?\#]*[bcdfghjklmnpqrstvwxz]{7}%i

is this correct?

That depends on your definition of "correct".  The RE looks ok, but the
hitrate could change dramatically.  It's hard to say without testing.


my understanding is that the rule looks for "dummy" hostnames in the server part. unfortunately, the way URIs are "exposed" by SA, this rule also applies to any thing that resembles a URI. This is a problem with relative URIs (aka href="foo.html" if foo matches the rule). [In the past, I have reported problems with things like ldap strings, ... that were interpreted as URIs by SA and caught by some rules].

in the present case, the FP ocurred for a "silly" NL that I whitelisted (they trigger other rules. but I am not the recipient, otherwise, I'll block'em at smtp time). so whether this is a real FP or not is debatable.

however, my understading of the rule is that it was not designed to catch such relative URIs. If so, then it should be fixed. thus my question.

In other words, should we "fix" the rule because t catches things it was not designed to catch, or should we be happy that it detects spam it was not supposed to catch? This is a general question of course.

I personally tend to believe that when Bayes is used, "logical" rules should only catch what they were supposed to catch. and I do use Bayes (I have disabled Bayes for two months to see the results, and while it was done on a single installation, the results were that Bayes is very helpful).

Reply via email to