This is a bug, will be fixed by https://github.com/apache/spark/pull/3230
On Wed, Nov 12, 2014 at 7:20 AM, rprabhu <[email protected]> wrote: > Hello, > I'm trying to run a classification task using mllib decision trees. After > successfully training the model, I was trying to test the model using some > sample rows when I hit this exception. > > The code snippet that caused this error is : > model = DecisionTree.trainClassifier(parsedData, numClasses=2, > categoricalFeaturesInfo={0:3}, > impurity='gini', maxDepth=30, > maxBins=100) > > predictions = model.predict(parsedData.map(lambda x: x.features)) > > which is pretty much like the example given on the website. > > I'm giving all the details that I think will help here (some of them might > not be totally useful). Please let me know if you need additional details. > > Programming Language: Python > Platform: Linux (Ubuntu 14.04) > Dataset: A part of the KDD 1999 dataset with 19 attributes and 450K rows. > mllib version : The latest master. (Using master because of the issue > reported here > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLLIB-Decision-Tree-ArrayIndexOutOfBounds-Exception-td16907.html) > > > > Stack Trace > ------------- > Traceback (most recent call last): > File "/home/rprabhu/Coding/github/SDNDDoS/classification/DecisionTree.py", > line 49, in <module> > predictions = model.predict(parsedData.map(lambda x: x.features)) > File "/home/rprabhu/Software/spark/python/pyspark/mllib/tree.py", line 42, > in predict > return self.call("predict", x.map(_convert_to_vector)) > File "/home/rprabhu/Software/spark/python/pyspark/mllib/common.py", line > 140, in call > return callJavaFunc(self._sc, getattr(self._java_model, name), *a) > File "/home/rprabhu/Software/spark/python/pyspark/mllib/common.py", line > 117, in callJavaFunc > return _java2py(sc, func(*args)) > File > "/home/rprabhu/Software/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/home/rprabhu/Software/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 304, in get_return_value > py4j.protocol.Py4JError: An error occurred while calling o39.predict. Trace: > py4j.Py4JException: Method predict([class > org.apache.spark.api.java.JavaRDD]) does not exist > at > py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) > at > py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) > at py4j.Gateway.invoke(Gateway.java:252) > at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > > Note: I am not hitting this issue when I try to predict with just one row. > predictions = model.predict(row) > > Can anyone let me know what is going wrong here? > > Thanks, > Rahul > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Getting-py4j-protocol-Py4JError-An-error-occurred-while-calling-o39-predict-while-doing-batch-predics-tp18730.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
