Hi Sudha, Have you checked if the labels are being loaded correctly? It sounds like the DT algorithm can't find any useful splits to make, so maybe it thinks they are all the same? Some data loading functions threshold labels to make them binary. Hope it helps, Joseph
On Fri, Jul 11, 2014 at 2:25 PM, SK <skrishna...@gmail.com> wrote: > Hi, > > I have a small dataset (120 training points, 30 test points) that I am > trying to classify into binary classes (1 or 0). The dataset has 4 > numerical > features and 1 binary label (1 or 0). > > I used LogisticRegression and SVM in MLLib and I got 100% accuracy in both > cases. But when I used DecisionTree, I am getting only 33% accuracy > (basically all the predicted test labels are 1 whereas actually only 10 out > of the 30 should be 1). I tried modifying the different parameters > (maxDepth, bins, impurity etc) and still am able to get only 33% accuracy. > > I used the same dataset with R's decision tree (rpart) and I am getting > 100% accuracy. I would like to understand why the performance of MLLib's > decision tree model is poor and if there is some way I can improve it. > > thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Decision-tree-classifier-in-MLlib-tp9457.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >