On Tue, 31 Mar 2009 18:34:05 +0200 (CEST)
"Benny Pedersen" <m...@junc.org> wrote:

> 
> On Tue, March 31, 2009 17:53, RW wrote:
> > I think it would be nice if SA could handle this automatically
> > e.g. if ham is over-represented then only autolearn ham where
> > p>0.001, and vice versa.
> 
> it already does

I'm not sure what you are saying here, but what I was suggesting was
that autolearning be modified to maintain the ratio of spam:ham within
reasonable limits. That doesn't appear to be the case when people
are ending up with 10:1 ratios.


> > At the moment the only way of tweaking this is to vary the
> > thresholds, which is about the worst possible way of doing it.
>
> why ?

Because it distorts the databases. If you push the ham autolearn
threshold down to learn less ham, it becomes much more selective. For
example you can be learning all the ham from domain A, but none from
domain B, but you're still autolearning spam from domain B. 

And in general the closer mail scores to 5.0 the more valuable it is to
learn it. Maintaining the ham:spam ratio by discarding the more
valuable candidates for learning doesn't make much sense to me.

Reply via email to