Up-sampling one class would change the result, yes. The prior for C2 would increase and so does its posterior across the board.
But that is dramatically changing your input: C2 isn't as prevalent as C1, but you are pretending it is. Their priors aren't the same. If you want to assume a uniform prior, just do so directly and drop the prior term. On Thu, Nov 20, 2014 at 1:04 PM, jatinpreet <jatinpr...@gmail.com> wrote: > Sean, > > My last sentence didn't come out right. Let me try to explain my question > again. > > For instance, I have two categories, C1 and C2. I have trained 100 samples > for C1 and 10 samples for C2. > > Now, I predict two samples one each of C1 and C2, namely S1 and S2 > respectively. I get the following prediction results, > > S1=> Category: C1, Probability: 0.7 > S2=> Category: C2, Probability: 0.04 > > Now, both the predictions are correct but their probabilities are far apart. > Can I improve the prediction probability by taking the 10 samples I have of > C2 and replicating each of them 10 times making the total count equal to 100 > which is same as C1. > > Can I expect this to increase the probability of sample S2 after training > the new set? Is this a viable approach? > > Thanks, > Jatin > > > > ----- > Novice Big Data Programmer > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Baye-s-classification-confidence-tp19341p19366.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org