--- Kelson <[EMAIL PROTECTED]> wrote:
> Stand H wrote: > > I'm not sure if I can feed non-english email to > > sa-learn. > > Bowie Bailey wrote: > > Let it learn as much > > ham and spam as you can manage and don't worry > about languages. > > One thing to look out for: Try to get both ham and > spam for each > language. The last thing you want is for Bayes to > decide that common, > let's say, German words are signs of spam because > the only German text > it's ever seen is spam. > > As Bowie points out, Bayes doesn't care about the > languages themselves > -- it's the tokens (for practical purposes, the > words). It doesn't care > whether "Necesito ir a casa a las dos y media." is > Spanish, it only > cares whether it's seen the words "Necesito", "ir", > "casa", etc. more > often in ham or in spam. > > -- > Kelson Vibber > SpeedGate Communications <www.speed.net> > Hi Kelson and Bowie, Thank you for your reply. In the situation that the sender client app doesn't encode the message properly, should I train it? Some user receive messages with the subject like ¿Ü©â½çÜâÍ and it's considered illegal and got hit by SUBJ_ILLEGAL_CHARS. When subject is encoded properly it is like ?iso-2022-jp?B?GyRCJEokKyQ/JDckYyRpGyhC?= And the body is encoded as =82=BF=82=DC=82=A9 So in these cases, does it make sense to train the message. I'm curious how bayes work effectively with these illegal char and encoded char. Another thing, say my friend forwards an email to me(he just wants to let me know the info in the message) and i want to train his email as ham. Should I just train it or remove the some headers first? Thank you. Stand __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com