On 2/19/2014 9:37 AM, Bowie Bailey wrote:
On 2/18/2014 8:49 PM, Kevin A. McGrail wrote:
On 2/18/2014 6:05 PM, Amir Caspi wrote:
On Feb 18, 2014, at 3:58 PM, John Hardin <jhar...@impsec.org> wrote:
Is there some reason the Bayes scores can't/shouldn't be static?
Indeed, I am wondering why Bayes would be auto-scored at all. By
definition, Bayes high scores should match only on spam, low scores
should match only on ham. That's not perfect, of course, but it is
basically by definition of how Bayes learns.
Given that, it seems to me that the Bayes scores should be static,
and my experience suggests that 99 or 999 should be scored pretty
heavily. (I'd say 00 should be scored negatively heavily, but I get
enough FNs with 00 that I don't like that idea... though it probably
means my DB is borked or my ham is full of spammy tokens.)
Actually it's a bit the opposite especially if using autolearn where
scoring to high on the 99% end can cause low percentage corpora to swing
heavily towards the high score too rapidly.
Bayes scores are not included when determining what to autolearn, so
changing the Bayes scores should have no effect on autolearning.
Or am I missing something?
I would have to look at the permutations of bayes_auto_learn_on_error,
bayes_auto_learn_threshold_spam and the tflag autolearn_force to answer
that question but my memory is that this is a self-perpetuating cycle
that I've seen on live servers when testing scoring.
regards,
KAM