Are you using 1.0.0? There was a bug, which was fixed in 1.0.1 and
master. If you don't want to switch to 1.0.1 or master, try to cache
and count test first. -Xiangrui

On Mon, Jul 28, 2014 at 6:07 PM, SK <skrishna...@gmail.com> wrote:
> Hi,
>
> In order to evaluate the ML classification accuracy, I am zipping up the
> prediction and test labels as follows and then comparing the pairs in
> predictionAndLabel:
>
> val prediction = model.predict(test.map(_.features))
> val predictionAndLabel = prediction.zip(test.map(_.label))
>
>
> However, I am finding that predictionAndLabel.count() has fewer elements
> than test.count().  For example, my test vector has 43 elements, but
> predictionAndLabel has only 38 pairs. I have tried other samples and always
> get fewer elements after zipping.
>
> Does zipping the two vectors cause any compression? or is this because of
> the distributed nature of the algorithm (I am running it in local mode on a
> single machine). In order to get the correct accuracy, I need the above
> comparison to be done by a single node on the entire test data (my data is
> quite small). How can I ensure that?
>
> thanks
>
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/evaluating-classification-accuracy-tp10822.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to