Alex <mysqlstud...@gmail.com> writes: > It's only these few types of messages that are very subjective and > experience from the broader open source community would be appreciated. > > If it has a legitimate unsubscribe link, does that make it ham? > > What criteria do you use to determine "spamminess/haminess of EVERY > message"?
I think you're asking the wrong question. A message is spam if it is bulk and unsolicited. So it is ham if the user asked for it (truly asked, not failed to uncheck a pre-checked box), and it is spam if they did not. Of course humans are not reliable about remembering. Having an unsubscribe link *absolutely* does not make it ham. So you have to ask users to classify, or you are just guessing. > Is the goal to have every message one of either BAYES_00 or BAYES_99 or is > it okay that newsletters (for example) are BAYES_50, and let other rules, > like network checks, determine the score? In general the great to the edge something is the more useful the score, but you can't actually push them all to 00/99. There could be a newsletter than user A asked for and is thus ham but user B did not and when it arrives to them it is spam. Thus, you tend to need per-user bayes. But if bayes says 50, that's life and you roll with it.