From: "Warren Togami" <wtog...@redhat.com>
Sent: Thursday, 2009/October/01 10:24
On 10/01/2009 01:16 PM, Warren Togami wrote:
On 10/01/2009 01:05 PM, John Hardin wrote:
On Thu, 1 Oct 2009, jdow wrote:
From: "John Hardin" <jhar...@impsec.org>
Yours may still hit .cn in the path part. May I suggest:
m;^https?://[^/?]+\.cn\b;
Regardless of their correctness, would you care to expound on the
success
of these two rules, John? I like what works not political correctness.
I think these are two interesting observations. Of course, they won't
work very well for somebody doing business with China or embedded
within the .cn TLD.
"what works" is based on the accuracy of the corpora. If the corpora
show lots of spam with .cn TLD URIs and little or no ham with such, then
that rule will hit often, and have a good S/O, and get a high score.
I too am surprised that .cn TLDs appear in 51% of the spam corpus but I
haven't looked into it in any detail. I can certainly check it against
my own corpora and see if it's reasonable - but then again, I don't do
any business with anyone in china, and I _do_ get a fair amount of bulk
emails from manufacturers in china purportedly looking for business
partners.
The "Oddity" I was pointing out at the beginning of the thread is not
prevalence of .cn URI's, but rather most of them appear to be exactly 8
characters long. Could someone please commit my T_CN_8_URL rule to the
sandbox so we can see if that trend holds beyond my own corpa?
Warren
(And yes I'm fully aware even this narrowed rule is prejudiced and unsafe.
This is is partly out of curiosity, and also wondering if it could be made
useful if meta booleaned with something else.)
Warren
I just had a thought, Warren. Look up Chinese numerology. 8 signifies
wealth or sudden prosperity. Conversely, I suspect few Chinese names
are four characters. Four is a pun on death. Some social sites might
like 5 letters - me. 7 is right out, it's a vulgar word in Cantonese.
9 is also slang or vulgar in Cantonese.
I wonder how many companies that deal with China have figured out that
an "888" toll free number is WONDERFUL, "Wealth, wealth, wealth."
I understand numerology is quite important to the Chinese. (Of course,
I am not claiming to be an expert. The above is mostly Wikipoodle and
surmise.)
{^_-}