Thanks Sean, it did turn out to be a simple mistake after all. I appreciate your help.
Jatin On Thu, Nov 27, 2014 at 7:52 PM, sowen [via Apache Spark User List] < ml-node+s1001560n19975...@n3.nabble.com> wrote: > No, the feature vector is not converted. It contains count n_i of how > often each term t_i occurs (or a TF-IDF transformation of those). You > are finding the class c such that P(c) * P(t_1|c)^n_1 * ... is > maximized. > > In log space it's log(P(c)) + n_1*log(P(t_1|c)) + ... > > So your n_1 counts (or TF-IDF values) are used as-is and this is where > the dot product comes from. > > Your bug is probably something lower-level and simple. I'd debug the > Spark example and print exactly its values for the log priors and > conditional probabilities, and the matrix operations, and yours too, > and see where the difference is. > > On Thu, Nov 27, 2014 at 11:37 AM, jatinpreet <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19975&i=0>> wrote: > > > Hi, > > > > I have been running through some troubles while converting the code to > Java. > > I have done the matrix operations as directed and tried to find the > maximum > > score for each category. But the predicted category is mostly different > from > > the prediction done by MLlib. > > > > I am fetching iterators of the pi, theta and testData to do my > calculations. > > pi and theta are in log space while my testData vector is not, could > that > > be a problem because I didn't see explicit conversion in Mllib also? > > > > For example, for two categories and 5 features, I am doing the following > > operation, > > > > [1,2] + [1 2 3 4 5 ] * [1,2,3,4,5] > > [6 7 8 9 10] > > These are simple element-wise matrix multiplication and addition > operators. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=19975&i=1> > For additional commands, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=19975&i=2> > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Accessing-posterior-probability-of-Naive-Baye-s-prediction-tp19828p19975.html > To unsubscribe from Accessing posterior probability of Naive Baye's > prediction, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=19828&code=amF0aW5wcmVldEBnbWFpbC5jb218MTk4Mjh8MTY0NDI0MzIyNw==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Regards, Jatinpreet Singh ----- Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Accessing-posterior-probability-of-Naive-Baye-s-prediction-tp19828p20011.html Sent from the Apache Spark User List mailing list archive at Nabble.com.