Up-sampling one class would change the result, yes. The prior for C2
would increase and so does its posterior across the board.

But that is dramatically changing your input: C2 isn't as prevalent as
C1, but you are pretending it is. Their priors aren't the same.

If you want to assume a uniform prior, just do so directly and drop
the prior term.

On Thu, Nov 20, 2014 at 1:04 PM, jatinpreet <jatinpr...@gmail.com> wrote:
> Sean,
>
> My last sentence didn't come out right. Let me try to explain my question
> again.
>
> For instance, I have two categories, C1 and C2. I have trained 100 samples
> for C1 and 10 samples for C2.
>
> Now, I predict two samples one each of C1 and C2, namely S1 and S2
> respectively. I get the following prediction results,
>
> S1=> Category: C1, Probability: 0.7
> S2=> Category: C2, Probability: 0.04
>
> Now, both the predictions are correct but their probabilities are far apart.
> Can I improve the prediction probability by taking the 10 samples I have of
> C2 and replicating each of them 10 times making the total count equal to 100
> which is same as C1.
>
> Can I expect this to increase the probability of sample S2 after training
> the new set? Is this a viable approach?
>
> Thanks,
> Jatin
>
>
>
> -----
> Novice Big Data Programmer
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Baye-s-classification-confidence-tp19341p19366.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to