John Hardin wrote:
On Thu, 1 Oct 2009, Warren Togami wrote:
uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i
describe T_CN_URL Contains a URL in the .cn domain
uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8
characters long
http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL,
nearly 51%.
7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus
Does this make any sense? This is funny. Could someone add this rule
to the sandbox? I'm just curious.
I note that neither is anchored at the beginning of the URI, so they may
be hitting on .cn embedded somewhere within the path part.
That doesn't explain 51%, though.
I run my own custom .cn tld URI rule, and whilst it's right down in
percentage terms atm, in the past it has certainly hit on around 50%
plus of all spam containing a URI. So depending on the corpus, I'm not
surprised by the 51%.
uri LOCAL_URI_CN m{https?://.{1,40}\.cn\b}
describe LOCAL_URI_CN contains link to Chinese tld