Genchev, Sergei said: > Does it make any sense to feed Chinese (mostly HTML) E-mail as spam/ham to > bayes? Would Bayes learn Chinese words as meaningless single-byte "words"? > Does it matter? Should I try to use UTF-8 locale? Any experiences would be > greatly appreciated, especially from mail admins in double-byte-speaking > offices.
Yes, it is worthwhile. It takes an approximate method to do this, but apparently it works quite well -- it will feed them in as tuples of 16 bits. > Related - I think - Bayes question: If E-mail body is HTML, does sa-learn > use "body" or "rawbody" when scoring words? body. --j. ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk