On Wed, 14 Feb 2018 16:20:30 +0100 Matus UHLAR - fantomas wrote: > >On Tue, 13 Feb 2018 21:02:46 +0000 > >Horváth Szabolcs wrote: > >> One more question: is there a recommended ham to spam ratio? 1:1? > > On 14.02.18 15:09, RW wrote: > >No, this is a myth. Bayes computes token probabilities from a > >token's frequencies in spam and ham, so it all scales through. If > >you have 2000 ham and 200 spam the problem is too few spams, not a > >bad ratio. > > my experience says you will need more ham than spam, because you want > to get rid of false positives (ham marked as spam) much more than of > false negatives.
My point is that an imbalance doesn't create a bias.