John D. Hardin wrote:
On Thu, 21 Sep 2006, mouss wrote:

Theo Van Dinter wrote:
On Tue, Sep 19, 2006 at 10:58:46PM +0200, mouss wrote:
URI_NOVOWEL fires with things like href="#id" where id is a string that starts with 7 "no-vowel" chars.

uri URI_NOVOWEL             m%^https?://[^/?]*[bcdfghjklmnpqrstvwxz]{7}%i
uri URI_NOVOWEL             m%^https?://[^/?\#]*[bcdfghjklmnpqrstvwxz]{7}%i

is this correct?
That depends on your definition of "correct".  The RE looks ok, but the
hitrate could change dramatically.  It's hard to say without testing.
my understanding is that the rule looks for "dummy" hostnames in the server part. unfortunately, the way URIs are "exposed" by SA, this rule also applies to any thing that resembles a URI. This is a problem with relative URIs (aka href="foo.html" if foo matches the rule).

Erm. How can it match relative and "#gibberish" URIs at all if the RE
is explicitly anchored to "https?://" at the start of the URI?

this is what I meant by "exposed". The URI module "converts" things to URI format even though they are not in the message. The goal is to catch things like "www.spammer.example", without multiplying the number of regexs. of course, rawbody won't catch that.

Reply via email to