Re: Decision tree classifier in MLlib

Joseph Bradley Fri, 18 Jul 2014 11:24:49 -0700

Hi Sudha,
Have you checked if the labels are being loaded correctly?  It sounds like
the DT algorithm can't find any useful splits to make, so maybe it thinks
they are all the same?  Some data loading functions threshold labels to
make them binary.
Hope it helps,
Joseph



On Fri, Jul 11, 2014 at 2:25 PM, SK <skrishna...@gmail.com> wrote:

> Hi,
>
> I have a small dataset (120 training points, 30 test points) that I am
> trying to classify into binary classes (1 or 0). The dataset has 4
> numerical
> features and 1 binary label (1 or 0).
>
> I used LogisticRegression and SVM in MLLib and I got 100% accuracy in both
> cases. But when I used DecisionTree, I am getting only 33% accuracy
> (basically all the predicted test labels are 1 whereas actually only 10 out
> of the 30 should be 1). I tried modifying the different parameters
> (maxDepth, bins, impurity etc) and still am able to get only 33% accuracy.
>
> I used the same dataset with R's decision tree  (rpart) and I am getting
> 100% accuracy. I would like to understand why the performance of MLLib's
> decision tree model is poor  and if there is some way I can improve it.
>
> thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Decision-tree-classifier-in-MLlib-tp9457.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Decision tree classifier in MLlib

Reply via email to