I believe assuming uniform priors is the way to go for my use case.
I am not sure about how to 'drop the prior term' with Mllib. I am just
providing the samples as they come after creating term vectors for each
sample. But I guess I can Google that information.
I appreciate all the help. Spark
Up-sampling one class would change the result, yes. The prior for C2
would increase and so does its posterior across the board.
But that is dramatically changing your input: C2 isn't as prevalent as
C1, but you are pretending it is. Their priors aren't the same.
If you want to assume a uniform pr
Sean,
My last sentence didn't come out right. Let me try to explain my question
again.
For instance, I have two categories, C1 and C2. I have trained 100 samples
for C1 and 10 samples for C2.
Now, I predict two samples one each of C1 and C2, namely S1 and S2
respectively. I get the following pre
Yes, certainly you need to consider the problem of how and when you
update the model with new info. The principle is the same. Low or high
posteriors aren't wrong per se. It seems normal in fact that one class
is more probable than others, maybe a lot more.
On Thu, Nov 20, 2014 at 10:31 AM, jatinp
Thanks a lot Sean. You are correct in assuming that my examples fall under a
single category.
It is interesting to see that the posterior probability can actually be
treated as something that is stable enough to have a constant threshold
value on per class basis. It would, I assume, keep changing
I assume that all examples do actually fall into exactly one of the classes.
If you always have to make a prediction then you always take the most
probable class.
If you can choose to make no classification for lack of confidence, yes you
want to pick a per-class threshold and take the most likel