Re: uri rules

mouss Wed, 28 May 2008 08:33:41 -0700

Randy Ramsdell wrote:

Matt Kettler wrote:
Joseph Brennan wrote:
I was surprised that this rule...

 uri CU_CN_LINK      /http:..\w+\.cn\b/

matches not only this...

 <a href="http://foobar.cn";>

but also this...
<a href="http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cnDomain</a>
First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.  That's
useful but not real clear in Mail::SpamAssassin::Conf.
Actually, it doesn't.. your second example has two URIs as far asSpamAssassin is concerned. "http://www.columbia.edu/foo.html"; and"http://Kuxun.cn";. Two separate URIs.
Since many email clients "auto-link" domains in text portions, likewww.google.com, SpamAssassin tries to find text strings that clientswill treat as URIs and use them in the URI tests as well.
How so? How does spamassassin URI check determine Kuxun.cn in a URIas opposed to someone who forgot to add a "space" after a sentenceend? Is it because it is located within the "a" tag?


try putting this
   "I often forget spaces.it happens to me all the time..."
in a message and run with -D. you'll see:

...
[74536] dbg: uridnsbl: domains to query: spaces.it
...

[74536] dbg: rules: ran uri rule __LOCAL_PP_NONPPURL ======> got hit:"http://spaces.it";

...

As you see, SA can't guess that a space is missing, so it checks the"resulting" URI anyway.



Things get "tricky" when you want to hit things like
   Did you visit http://www.example.com/foo/bar?if so...
and you are looking for specific patterns in the "bar" part...


Second, I can't figure out how \w+ matches the punctuation and spaces!

It doesn't. :)

Re: uri rules

Reply via email to