Hi,

I'm trying to compile and use Spark 1.3.1 with Hadoop 2.2.0. I compiled
from course with the Maven command:

mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -Phive -Phive-0.12.0
-Phive-thriftserver -DskipTests clean package

When I run this with a local master (bin/spark-shell --master local[2]) I
can read data from the HDFS. When I run as a four node cluster, with the
same jars and setup on each node, I get these errors:

scala> val textInput = sc.textFile("hdfs:///testing/dict")
textInput: org.apache.spark.rdd.RDD[String] = hdfs:///testing/dict
MapPartitionsRDD[7] at textFile at <console>:24

scala> textInput take 10
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage
3.0 (TID 17, hadoop-kn-t503.systems.private):
java.lang.IllegalStateException: unread block data
        at
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
        at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org
<http://org.apache.spark.scheduler.dagscheduler.org/>
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

Can anyone tell me what the issue is?

Reply via email to