The problem is solved after hadoop-core dependency added. But I think there is a misunderstanding about local files. I found this one:
"Note that if you've connected to a Spark master, it's possible that it will attempt to load the file on one of the different machines in the cluster, so make sure it's available on all the cluster machines. In general, in future you will want to put your data in HDFS, S3, or similar file systems to avoid this problem." http://docs.sigmoidanalytics.com/index.php/Using_the_Spark_Shell This means that you can't use local files with spark. I don't understand why, because after calling addFile() or textFile(), the file can be downloaded by every node on the cluster and became accessible. Anyway, if you got "Loss was due to java.io.EOFException", you have to make sure that hadoop libs are available. <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>0.9.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>2.0.0-mr1-cdh4.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.0.0-cdh4.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.0.0-cdh4.6.0</version> </dependency> Cheers! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-loading-files-Loss-was-due-to-java-io-EOFException-java-io-EOFException-tp6090p6201.html Sent from the Apache Spark User List mailing list archive at Nabble.com.