I've got the same problem trying to execute the following scriptlet from my Eclipse environment:
/v = sc.textFile("path_to_my_file") print v.take(1) / File "my_script.py", line 18, in <module> print v.take(1) File "spark-1.0.0-bin-hadoop2\python\pyspark\rdd.py", line 868, in take * iterator = mapped._jrdd.collectPartitions(partitionsToTake)[0].iterator()* File "spark-1.0.0-bin-hadoop2\python\lib\py4j-0.8.1-src.zip\py4j\java_gateway.py", line 537, in __call__ File "spark-1.0.0-bin-hadoop2\python\lib\py4j-0.8.1-src.zip\py4j\protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o21.collectPartitions. : java.net.SocketException: Connection reset by peer: socket write error It doesn't matter whether the file is stored in the HDFS or in my local hard disk, however, it does matter if the file contains more than 315 lines (records) or not. If the file contains less or equal than 315 line, my script works perfectly! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-Failed-to-run-first-tp7691p8124.html Sent from the Apache Spark User List mailing list archive at Nabble.com.