RE: Those "Re: good obfupills" spams (bayes scores)

Bowie Bailey Mon, 01 May 2006 08:29:02 -0700

jdow wrote:
> From: "Bart Schaefer" <[EMAIL PROTECTED]>
> > 
> > On 4/29/06, Matt Kettler <[EMAIL PROTECTED]> wrote:
> > > In SA 3.1.0 they did force-fix the scores of the bayes rules,
> > > particularly the high-end. The perceptron assigned BAYES_99 a
> > > score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up
> > > to 3.50.
> > > 
> > > That does make me wonder if:
> > >     1) When BAYES_9x FPs, it FPs in conjunction with lots of
> > > other rules due to the ham corpus being polluted with spam.
> > 
> > My recollection is that there was speculation that the BAYES_9x
> > rules were scored "too low" not because they FP'd in conjunction
> > with other rules, but because against the corpus they TRUE P'd in
> > conjunction with lots of other rules, and that it therefore wasn't
> > necessary for the perceptron to assign a high score to BAYES_9x in
> > order to push the total over the 5.0 threshold.
> > 
> > The trouble with that is that users expect training on their
> > personal spam flow to have a more significant effect on the
> > scoring.  I want to train bayes to compensate for the LACK of
> > other rules matching, not just to give a final nudge when a bunch
> > of others already hit.
> > 
> > I filed a bugzilla some while ago suggesting that the bayes
> > percentage ought to be used to select a rule set, not to adjust
> > the score as a component of a rule set.
> 
> There is one other gotcha. I bet vastly different scores are
> warranted for Bayes when run with per user training and rules as
> compared to global training and rules.


Ack!  I missed the subject change on this thread prior to my last
reply.  Sorry about the duplication.

I think it is also a matter of manual training vs autolearning.  A
Bayes database that is consistently trained manually will be more
accurate and can support higher scores.

-- 
Bowie

RE: Those "Re: good obfupills" spams (bayes scores)

Reply via email to