Michael Monnerie wrote:
> On Dienstag, 9. Mai 2006 23:01 Bowie Bailey wrote:
> > Hmm... If you are training Bayes, and all of your ham is in English,
> > then what does Bayes do with the Chinese ham your customers get?
> 
> Nothing. But you won't get a SPAM report from bayes if the e-mail is
> chinese and you never feed chinese language e-mail. So no FPs.

I guess that would work if you simply don't feed Bayes with any
foreign language material at all.

> > True, spam is spam.  It's the vast differences in ham that I am more
> > worried about.  Our customers are salesmen for the most part, so
> > they are constantly sending and receiving marketing type emails.
> > For us, marketing stuff is almost always considered spam.  I think
> > this would cause a problem with false positives for our customers
> > if I train Bayes based on our idea of ham and spam.
> 
> The important thing is that you should *never* feed to bayes something
> that *could* be a legit e-mail. Most people seem to make that error. I
> do NOT feed SPAM nor HAM that could be a legit mail.

So you are saying that I should not feed Bayes with the unsolicited
marketing garbage that I get because it looks like something that
could have been requested?

> Just those nigerian who want to give you some million $ because you
> are so nice, or those lotteries where you won a lot but before you
> have to pay, the very good jobs a lot of people seem to offer where
> you can earn 5000$ for only 3 hours of work and so on.
> 
> No chance this could be HAM for anybody (with at least some brain, but
> anyway you have to protect such people from themselves *g*). The same
> for feeding HAM: Give it only food that *is legit e-mail*, not some
> which could be.
> 
> Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong.

Wrong for who?  If it looks like marketing, 99% of the time, I don't
want it.  And for most of the accounts that I deal with, this goes up
to 100%.  Not true for my customers, tho.

My philosophy with Bayes has always been to skip the ham/spam
definitions and go with a wanted/unwanted model.  This way Bayes
learns to filter out the emails you don't want even if some of them
may technically be ham.  (Obviously, I would not be able to do this on
a site-wide installation)

> Another good thing: Since I help with mass-checks, I found that of my
> 6000 SPAMs, I had about 4 or 5 which I had to delete (but unlearn
> before), as they were mistakes. That's the advantage you get back when
> running mass-checks.

-- 
Bowie

Reply via email to