Joseph Brennan wrote:

I was surprised that this rule...

 uri CU_CN_LINK      /http:..\w+\.cn\b/

matches not only this...

 <a href="http://foobar.cn";>

but also this...

<a href="http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cn Domain</a>


First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.  That's
useful but not real clear in Mail::SpamAssassin::Conf.
Actually, it doesn't.. your second example has two URIs as far as SpamAssassin is concerned. "http://www.columbia.edu/foo.html"; and "http://Kuxun.cn";. Two separate URIs.

Since many email clients "auto-link" domains in text portions, like www.google.com, SpamAssassin tries to find text strings that clients will treat as URIs and use them in the URI tests as well.


Second, I can't figure out how \w+ matches the punctuation and spaces!
It doesn't. :)


Reply via email to