Genchev, Sergei said:

>  Does it make any sense to feed Chinese (mostly HTML) E-mail as spam/ham to
> bayes? Would Bayes learn Chinese words as meaningless single-byte "words"?
> Does it matter? Should I try to use UTF-8 locale? Any experiences would be
> greatly appreciated, especially from mail admins in double-byte-speaking
> offices.

Yes, it is worthwhile.  It takes an approximate method to do this, but
apparently it works quite well -- it will feed them in as tuples of 16
bits.

>  Related - I think - Bayes question: If E-mail body is HTML, does sa-learn
> use "body" or "rawbody" when scoring words?

body.

--j.


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to