On Fri, 20 Feb 2015 21:36:38 +0100 Reindl Harald wrote: >
> > And I'd suggest the same for non-spam, train duplicative ham even > > if it happens to be similarly addressed to different users. More > > data is (nearly) always better for bayesian learning systems > > of course With the caveat that you keep an eye on retention. > in doubt the amout of trained ham and spam should be near 50%, This is myth. What's important is to have enough of each, the actual ratio is not important.