Re: Naive Baye's classification confidence

2014-11-20 Thread jatinpreet
I believe assuming uniform priors is the way to go for my use case. I am not sure about how to 'drop the prior term' with Mllib. I am just providing the samples as they come after creating term vectors for each sample. But I guess I can Google that information. I appreciate all the help. Spark

Re: Naive Baye's classification confidence

2014-11-20 Thread Sean Owen
Up-sampling one class would change the result, yes. The prior for C2 would increase and so does its posterior across the board. But that is dramatically changing your input: C2 isn't as prevalent as C1, but you are pretending it is. Their priors aren't the same. If you want to assume a uniform pr

Re: Naive Baye's classification confidence

2014-11-20 Thread jatinpreet
Sean, My last sentence didn't come out right. Let me try to explain my question again. For instance, I have two categories, C1 and C2. I have trained 100 samples for C1 and 10 samples for C2. Now, I predict two samples one each of C1 and C2, namely S1 and S2 respectively. I get the following pre

Re: Naive Baye's classification confidence

2014-11-20 Thread Sean Owen
Yes, certainly you need to consider the problem of how and when you update the model with new info. The principle is the same. Low or high posteriors aren't wrong per se. It seems normal in fact that one class is more probable than others, maybe a lot more. On Thu, Nov 20, 2014 at 10:31 AM, jatinp

Re: Naive Baye's classification confidence

2014-11-20 Thread jatinpreet
Thanks a lot Sean. You are correct in assuming that my examples fall under a single category. It is interesting to see that the posterior probability can actually be treated as something that is stable enough to have a constant threshold value on per class basis. It would, I assume, keep changing

Re: Naive Baye's classification confidence

2014-11-20 Thread Sean Owen
I assume that all examples do actually fall into exactly one of the classes. If you always have to make a prediction then you always take the most probable class. If you can choose to make no classification for lack of confidence, yes you want to pick a per-class threshold and take the most likel