John Hardin wrote:
On Thu, 1 Oct 2009, Warren Togami wrote:

uri T_CN_URL      /[^\/]+\.cn(?:$|\/|\?)/i
describe T_CN_URL Contains a URL in the .cn domain

uri T_CN_8_URL      /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long

http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%.

7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus

Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious.

I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part.

That doesn't explain 51%, though.


I run my own custom .cn tld URI rule, and whilst it's right down in percentage terms atm, in the past it has certainly hit on around 50% plus of all spam containing a URI. So depending on the corpus, I'm not surprised by the 51%.

uri             LOCAL_URI_CN            m{https?://.{1,40}\.cn\b}
describe        LOCAL_URI_CN            contains link to Chinese tld

Reply via email to