Joseph Brennan wrote:

I was surprised that this rule...

 uri CU_CN_LINK      /http:..\w+\.cn\b/

matches not only this...

 <a href="http://foobar.cn";>

but also this...

<a href="http://www.columbia.edu/foo.html";>KooXoo Buys Kuxun.cn Domain</a>


First, I did not realize that SpamAssassin's idea of "uri" includes not
only the uri, but the start tag, end tag, and all in between.


it actually hits "Kuxun.cn" (not the href part). The reason is that some spammers put uris without the http part (and without href).

the drawback is that uri checks may hit things that are not really domains. this includes ldap strings, program names (program.com), ... etc.

  That's
useful but not real clear in Mail::SpamAssassin::Conf.

Second, I can't figure out how \w+ matches the punctuation and spaces!

see above. just run with -D and you'll see
...
[73674] dbg: rules: ran uri rule CU_CN_LINK ======> got hit: "http://Kuxun.cn";
...



Joseph Brennan
Columbia University I T



Reply via email to