Yup, that's right, it becomes difficult if we want to support multiple
language in one spam detection solution. and it's true that there are some
best practice for single language. but didn't see too much support multiple

---
Yu Qian
Ottawa Ontario
Phone: (514)-553-0198



On Tue, Apr 12, 2016 at 1:38 PM, Reindl Harald <h.rei...@thelounge.net>
wrote:

> STAY ON LIST
>
> Am 12.04.2016 um 19:22 schrieb Yu Qian:
>
>> Yes, right, what I am interested is that as Chinese language is
>> different. so does SpamAssassin has a strong tokenizer to do that? or
>> they just use the same tokenizer?
>>
>> ---
>> Yu Qian
>> Ottawa Ontario
>> Phone: (514)-553-0198
>>
>>
>>
>> On Tue, Apr 12, 2016 at 1:16 PM, Reindl Harald <h.rei...@thelounge.net
>> <mailto:h.rei...@thelounge.net>> wrote:
>>
>>
>>
>>     Am 12.04.2016 um 18:44 schrieb Yu Qian:
>>
>>         SpamAssassin used Bayes as classier, this is typical and
>>         efficient for
>>         English. But how does it processing languages like Asian language?
>>
>>         Can anyone introduce that or anyone can show the code where
>>         SpamAssassin
>>         do that?
>>
>>
>>     bayes is by definition language agnostic
>>
>>     *you train* bayes with samples of ham and spam (at least a few
>>     hundret of both) and the tokenizer splits the messages in parts and
>>     creates a database which words appear how often in spam and ham
>>     (simplified explained)
>>
>>
>>
>>
>>
> --
>
> Reindl Harald
> the lounge interactive design GmbH
> A-1060 Vienna, Hofmühlgasse 17
> CTO / CISO / Software-Development
> m: +43 (676) 40 221 40, p: +43 (1) 595 3999 33
> icq: 154546673, http://www.thelounge.net/
>
> http://www.thelounge.net/signature.asc.what.htm
>
>

Reply via email to