Re: Bayes autolearn questions

Alex Regan Tue, 09 Sep 2014 06:51:06 -0700

Hi,

Did you understand that all
tokens are learned, regardless whether they have been seen before?


That doesn't really matter from a user perspective, though, right? I
mean, if there are tokens that have already been learned are learned
again, the net result is zero.


Very much not zero. Each token has several values assocated with it:
  # ham
  # spam
  time-stamp

So each time it's learned its respective ham/spam counter is incremented
which indicates how spammy or hammy a given token is and its time-stamp is
updated indicating how "fresh" a token is. The bayes expiry process removes
"stale" tokens when it does its job to prune the database down to size.

Ah, yes, of course. I knew about that, but somehow didn't put ittogether with this.

I would like to know why, after training similar messages a number oftimes, it still shows the same bayes score on new similar messages.

I'd also like to figure out why or how many more times it's necessaryfor a message to be re-trained to reflect the new desired persuasion.

I've had this particular FN with frequently a bayes50, sometimes lower,that also have a few dozen every day that are tagged as spam properly,but still have bayes50. I pull them out of the quarantine and keeptraining them as spam, but there's still a few that get through every day.

Is there any particular analysis I can do on one of the FNs that cantell me how far off the bayes50 is from becoming bayes99 in a similarmessage?

Hopefully that's clear. I understand there's a large number of variablesinvolved here, and I would think the fewer number of tokens in amessage, the more difficult it probably should be to persuade, but it'sfrustrating to see bayes50 so repeatedly...


Thanks,
Alex

Re: Bayes autolearn questions

Reply via email to