Hi everyone, Our company gets a lot of legitimate and not-so-legitimate E-mail in Chinese. Our people in Taiwan have quite a bit more spam slip through than our US and European offices. Having read a lot of warnings about using UTF-8 locale, I am running SA 2.55 with LANG=en_US on RH8. Does it make any sense to feed Chinese (mostly HTML) E-mail as spam/ham to bayes? Would Bayes learn Chinese words as meaningless single-byte "words"? Does it matter? Should I try to use UTF-8 locale? Any experiences would be greatly appreciated, especially from mail admins in double-byte-speaking offices.
Related - I think - Bayes question: If E-mail body is HTML, does sa-learn use "body" or "rawbody" when scoring words? Thank you very much, Sergei Genchev ------------------------- This e-mail and any attachments may contain confidential material for the sole use of the intended recipient. If you are not the intended recipient, please be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. Thank you for your cooperation ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk