On Dienstag, 9. Mai 2006 23:01 Bowie Bailey wrote: > Hmm... If you are training Bayes, and all of your ham is in English, > then what does Bayes do with the Chinese ham your customers get?
Nothing. But you won't get a SPAM report from bayes if the e-mail is chinese and you never feed chinese language e-mail. So no FPs. > True, spam is spam. It's the vast differences in ham that I am more > worried about. Our customers are salesmen for the most part, so they > are constantly sending and receiving marketing type emails. For us, > marketing stuff is almost always considered spam. I think this would > cause a problem with false positives for our customers if I train > Bayes based on our idea of ham and spam. The important thing is that you should *never* feed to bayes something that *could* be a legit e-mail. Most people seem to make that error. I do NOT feed SPAM nor HAM that could be a legit mail. Just those nigerian who want to give you some million $ because you are so nice, or those lotteries where you won a lot but before you have to pay, the very good jobs a lot of people seem to offer where you can earn 5000$ for only 3 hours of work and so on. No chance this could be HAM for anybody (with at least some brain, but anyway you have to protect such people from themselves *g*). The same for feeding HAM: Give it only food that *is legit e-mail*, not some which could be. Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong. Another good thing: Since I help with mass-checks, I found that of my 6000 SPAMs, I had about 4 or 5 which I had to delete (but unlearn before), as they were mistakes. That's the advantage you get back when running mass-checks. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: "lynx -source http://zmi.at/zmi3.asc | gpg --import" // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE
pgp7wTVFG6Tpn.pgp
Description: PGP signature