Re: pyspark-Failed to run first

angel2014 Mon, 23 Jun 2014 06:51:16 -0700

I've got the same problem trying to execute the following scriptlet from my
Eclipse environment:


/v = sc.textFile("path_to_my_file")
print v.take(1)
/

  File "my_script.py", line 18, in <module>
    print v.take(1)
  File "spark-1.0.0-bin-hadoop2\python\pyspark\rdd.py", line 868, in take
 *   iterator =
mapped._jrdd.collectPartitions(partitionsToTake)[0].iterator()*
  File
"spark-1.0.0-bin-hadoop2\python\lib\py4j-0.8.1-src.zip\py4j\java_gateway.py",
line 537, in __call__
  File
"spark-1.0.0-bin-hadoop2\python\lib\py4j-0.8.1-src.zip\py4j\protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o21.collectPartitions.
: java.net.SocketException: Connection reset by peer: socket write error


It doesn't matter whether the file is stored in the HDFS or in my local hard
disk, however, it does matter if the file contains more than 315 lines
(records) or not. If the file contains less or equal than 315 line, my
script works perfectly!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-Failed-to-run-first-tp7691p8124.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: pyspark-Failed to run first

Reply via email to