On Mon, 23 Feb 2015 00:22:31 +0100 Reindl Harald wrote:
> >> in doubt the amout of trained ham and spam should be near 50%, > > > > This is myth. What's important is to have enough of each, the actual > > ratio is not important. > > true - but you don't have much to measure the "enough of each" and so > try to keep 50/50 is a good starting point - hence i said "in doubt" A few thousand of each is a good starting point, but having too little spam or ham is not a good reason to cut back learning the other. > finally you get lest a problem in both cases: > > * 1% ham samples, 99% spam samples > * 1% spam samples, 99% ham samples > > they bayes occupies a trend No, it doesn't, the ratio doesn't create a bias. There's nothing intrinsically wrong with 1:99 if the 1% is enough; 100:9900 is bad because 100 is too small not because of the ratio.