On Tue, 21 Aug 2012, Adam Moffett wrote:

One of our users definitely emails with Chinese vendors. I'm sure they correspond in English, but I'm guessing the Chinese folks might have Chinese characters in their signature line or some such.

Consider Bayes.

I have trained my Bayes with Chinese-language spams and they are all getting BAYES_99 now. If you do decide to train on Chinese-language spams, you will definitely want to also train hams from your user's Chinese vendors to catch any use of non-latin characters in .sigs or message headers.

Be sure to keep your training corpora on hand so that you can un-train those messages if it doesn't work out.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
-----------------------------------------------------------------------
 3 days until the 1933rd anniversary of the destruction of Pompeii

Reply via email to