Re: Tips on training bayes?

Greg Troxel Wed, 18 Sep 2024 16:27:06 -0700

Alex <mysqlstud...@gmail.com> writes:

> It's only these few types of messages that are very subjective and
> experience from the broader open source community would be appreciated.
>
> If it has a legitimate unsubscribe link, does that make it ham?
>
> What criteria do you use to determine "spamminess/haminess of EVERY
> message"?


I think you're asking the wrong question.

A message is spam if it is bulk and unsolicited.  So it is ham if the
user asked for it (truly asked, not failed to uncheck a pre-checked
box), and it is spam if they did not.   Of course humans are not
reliable about remembering.

Having an unsubscribe link *absolutely* does not make it ham.

So you have to ask users to classify, or you are just guessing.


> Is the goal to have every message one of either BAYES_00 or BAYES_99 or is
> it okay that newsletters (for example) are BAYES_50, and let other rules,
> like network checks, determine the score?

In general the great to the edge something is the more useful the score,
but you can't actually push them all to 00/99.  There could be a
newsletter than user A asked for and is thus ham but user B did not and
when it arrives to them it is spam.

Thus, you tend to need per-user bayes.


But if bayes says 50, that's life and you roll with it.

Re: Tips on training bayes?

Reply via email to