Michael Monnerie wrote: > On Dienstag, 9. Mai 2006 23:01 Bowie Bailey wrote: > > Hmm... If you are training Bayes, and all of your ham is in English, > > then what does Bayes do with the Chinese ham your customers get? > > Nothing. But you won't get a SPAM report from bayes if the e-mail is > chinese and you never feed chinese language e-mail. So no FPs.
I guess that would work if you simply don't feed Bayes with any foreign language material at all. > > True, spam is spam. It's the vast differences in ham that I am more > > worried about. Our customers are salesmen for the most part, so > > they are constantly sending and receiving marketing type emails. > > For us, marketing stuff is almost always considered spam. I think > > this would cause a problem with false positives for our customers > > if I train Bayes based on our idea of ham and spam. > > The important thing is that you should *never* feed to bayes something > that *could* be a legit e-mail. Most people seem to make that error. I > do NOT feed SPAM nor HAM that could be a legit mail. So you are saying that I should not feed Bayes with the unsolicited marketing garbage that I get because it looks like something that could have been requested? > Just those nigerian who want to give you some million $ because you > are so nice, or those lotteries where you won a lot but before you > have to pay, the very good jobs a lot of people seem to offer where > you can earn 5000$ for only 3 hours of work and so on. > > No chance this could be HAM for anybody (with at least some brain, but > anyway you have to protect such people from themselves *g*). The same > for feeding HAM: Give it only food that *is legit e-mail*, not some > which could be. > > Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong. Wrong for who? If it looks like marketing, 99% of the time, I don't want it. And for most of the accounts that I deal with, this goes up to 100%. Not true for my customers, tho. My philosophy with Bayes has always been to skip the ham/spam definitions and go with a wanted/unwanted model. This way Bayes learns to filter out the emails you don't want even if some of them may technically be ham. (Obviously, I would not be able to do this on a site-wide installation) > Another good thing: Since I help with mass-checks, I found that of my > 6000 SPAMs, I had about 4 or 5 which I had to delete (but unlearn > before), as they were mistakes. That's the advantage you get back when > running mass-checks. -- Bowie