Hi,

Did you understand that all
tokens are learned, regardless whether they have been seen before?

That doesn't really matter from a user perspective, though, right? I
mean, if there are tokens that have already been learned are learned
again, the net result is zero.

Very much not zero. Each token has several values assocated with it:
  # ham
  # spam
  time-stamp

So each time it's learned its respective ham/spam counter is incremented
which indicates how spammy or hammy a given token is and its time-stamp is
updated indicating how "fresh" a token is. The bayes expiry process removes
"stale" tokens when it does its job to prune the database down to size.

Ah, yes, of course. I knew about that, but somehow didn't put it together with this.

I would like to know why, after training similar messages a number of times, it still shows the same bayes score on new similar messages.

I'd also like to figure out why or how many more times it's necessary for a message to be re-trained to reflect the new desired persuasion.

I've had this particular FN with frequently a bayes50, sometimes lower, that also have a few dozen every day that are tagged as spam properly, but still have bayes50. I pull them out of the quarantine and keep training them as spam, but there's still a few that get through every day.

Is there any particular analysis I can do on one of the FNs that can tell me how far off the bayes50 is from becoming bayes99 in a similar message?

Hopefully that's clear. I understand there's a large number of variables involved here, and I would think the fewer number of tokens in a message, the more difficult it probably should be to persuade, but it's frustrating to see bayes50 so repeatedly...

Thanks,
Alex

Reply via email to