On Mon, 23 Feb 2015 00:22:31 +0100
Reindl Harald wrote:

> >> in doubt the amout of trained ham and spam should be near 50%,
> >
> > This is myth. What's important is to have enough of each, the actual
> > ratio is not important.
> 
> true - but you don't have much to measure the "enough of each" and so 
> try to keep 50/50 is a good starting point - hence i said "in doubt"

A few thousand of each is a good starting point, but having too little
spam or ham is not a good reason to cut back learning the other.


> finally you get lest a problem in both cases:
> 
> * 1% ham samples, 99% spam samples
> * 1% spam samples, 99% ham samples
> 
> they bayes occupies a trend

No, it doesn't, the ratio doesn't create a bias. There's nothing
intrinsically wrong with 1:99 if the 1% is enough; 100:9900 is bad
because 100 is too small not because of the ratio.

Reply via email to