The problem is solved after hadoop-core dependency added. But I think there
is a misunderstanding about local files. I found this one:

"Note that if you've connected to a Spark master, it's possible that it will
attempt to load the file on one of the different machines in the cluster, so
make sure it's available on all the cluster machines. In general, in future
you will want to put your data in HDFS, S3, or similar file systems to avoid
this problem."

http://docs.sigmoidanalytics.com/index.php/Using_the_Spark_Shell

This means that you can't use local files with spark. I don't understand
why, because after calling addFile() or textFile(), the file can be
downloaded by every node on the cluster and became accessible. 

Anyway, if you got "Loss was due to java.io.EOFException", you have to make
sure that hadoop libs are available.

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>0.9.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>2.0.0-mr1-cdh4.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.0.0-cdh4.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.0.0-cdh4.6.0</version>
        </dependency>

Cheers!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-loading-files-Loss-was-due-to-java-io-EOFException-java-io-EOFException-tp6090p6201.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to