Hi,
I am running ALS to train a data set of around 150000 lines in my local
machine. When I call train I am getting following exception.
*print *mappedRDDs.count() # this prints correct RDD count
model = ALS.train(mappedRDDs, 10, 15)
15/12/16 18:43:18 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
java.net.SocketException: Connection reset by peer: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at
java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:121)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at
org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
at
scala.collection.Iterator$class.foreach(Iterator.scala:727)
at
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
I am getting this error only when I use MLlib with pyspark. how to resolve
this issue?
Regards,
Surendran