Re: spark-shell giving me error of unread block data

Anson Abraham Thu, 20 Nov 2014 08:03:56 -0800

Didn't really edit the configs as much .. but here's what the spark-env.sh
is:


#!/usr/bin/env bash
##
# Generated by Cloudera Manager and should not be modified directly
##

export SPARK_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark
export STANDALONE_SPARK_MASTER_HOST=cloudera-1.testdomain.net
export SPARK_MASTER_PORT=7077
export
DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop

### Path of Spark assembly jar in HDFS
export
SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-/user/spark/share/lib/spark-assembly.jar}

### Let's run everything with JVM runtime, instead of Scala
export SPARK_LAUNCH_WITH_SCALA=0
export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib
export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST

export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}

if [ -n "$HADOOP_HOME" ]; then
  export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}

And here's the spark-defaults.conf:

spark.eventLog.dir=hdfs://
cloudera-2.testdomain.net:8020/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.master=spark://cloudera-1.testdomain.net:7077


On Wed Nov 19 2014 at 8:06:40 PM Ritesh Kumar Singh <
riteshoneinamill...@gmail.com> wrote:

> As Marcelo mentioned, the issue occurs mostly when incompatible classes
> are used by executors or drivers.  Try out if the output is coming on
> spark-shell. If yes, then most probably in your case, there might be some
> issue with your configuration files. It will be helpful if you can paste
> the contents of the config files you edited.
>
> On Thu, Nov 20, 2014 at 5:45 AM, Anson Abraham <anson.abra...@gmail.com>
> wrote:
>
>> Sorry meant cdh 5.2 w/ spark 1.1.
>>
>> On Wed, Nov 19, 2014, 17:41 Anson Abraham <anson.abra...@gmail.com>
>> wrote:
>>
>>> yeah CDH distribution (1.1).
>>>
>>> On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin <van...@cloudera.com>
>>> wrote:
>>>
>>>> On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham <anson.abra...@gmail.com>
>>>> wrote:
>>>> > yeah but in this case i'm not building any files.  just deployed out
>>>> config
>>>> > files in CDH5.2 and initiated a spark-shell to just read and output a
>>>> file.
>>>>
>>>> In that case it is a little bit weird. Just to be sure, you are using
>>>> CDH's version of Spark, not trying to run an Apache Spark release on
>>>> top of CDH, right? (If that's the case, then we could probably move
>>>> this conversation to cdh-us...@cloudera.org, since it would be
>>>> CDH-specific.)
>>>>
>>>>
>>>> > On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin <van...@cloudera.com>
>>>> wrote:
>>>> >>
>>>> >> Hi Anson,
>>>> >>
>>>> >> We've seen this error when incompatible classes are used in the
>>>> driver
>>>> >> and executors (e.g., same class name, but the classes are different
>>>> >> and thus the serialized data is different). This can happen for
>>>> >> example if you're including some 3rd party libraries in your app's
>>>> >> jar, or changing the driver/executor class paths to include these
>>>> >> conflicting libraries.
>>>> >>
>>>> >> Can you clarify whether any of the above apply to your case?
>>>> >>
>>>> >> (For example, one easy way to trigger this is to add the
>>>> >> spark-examples jar shipped with CDH5.2 in the classpath of your
>>>> >> driver. That's one of the reasons I filed SPARK-4048, but I digress.)
>>>> >>
>>>> >>
>>>> >> On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham <
>>>> anson.abra...@gmail.com>
>>>> >> wrote:
>>>> >> > I'm essentially loading a file and saving output to another
>>>> location:
>>>> >> >
>>>> >> > val source = sc.textFile("/tmp/testfile.txt")
>>>> >> > source.saveAsTextFile("/tmp/testsparkoutput")
>>>> >> >
>>>> >> > when i do so, i'm hitting this error:
>>>> >> > 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile
>>>> at
>>>> >> > <console>:15
>>>> >> > org.apache.spark.SparkException: Job aborted due to stage
>>>> failure: Task
>>>> >> > 0 in
>>>> >> > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in
>>>> stage
>>>> >> > 0.0
>>>> >> > (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateExceptio
>>>> n:
>>>> >> > unread
>>>> >> > block data
>>>> >> >
>>>> >> >
>>>> >> > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(
>>>> ObjectInputStream.java:2421)
>>>> >> >
>>>> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>>>> >> >
>>>> >> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea
>>>> m.java:1990)
>>>> >> >
>>>> >> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.
>>>> java:1915)
>>>> >> >
>>>> >> >
>>>> >> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
>>>> am.java:1798)
>>>> >> >
>>>> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>> >> >         java.io.ObjectInputStream.readObject(ObjectInputStream.
>>>> java:370)
>>>> >> >
>>>> >> >
>>>> >> > org.apache.spark.serializer.JavaDeserializationStream.readOb
>>>> ject(JavaSerializer.scala:62)
>>>> >> >
>>>> >> >
>>>> >> > org.apache.spark.serializer.JavaSerializerInstance.deseriali
>>>> ze(JavaSerializer.scala:87)
>>>> >> >
>>>> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.
>>>> scala:162)
>>>> >> >
>>>> >> >
>>>> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1145)
>>>> >> >
>>>> >> >
>>>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:615)
>>>> >> >         java.lang.Thread.run(Thread.java:744)
>>>> >> > Driver stacktrace:
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch
>>>> eduler$DAGScheduler$$failJobAndIndependentStages(DAGSchedule
>>>> r.scala:1185)
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>> 1.apply(DAGScheduler.scala:1174)
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>> 1.apply(DAGScheduler.scala:1173)
>>>> >> > at
>>>> >> >
>>>> >> > scala.collection.mutable.ResizableArray$class.foreach(Resiza
>>>> bleArray.scala:59)
>>>> >> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.
>>>> scala:47)
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
>>>> ler.scala:1173)
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>> etFailed$1.apply(DAGScheduler.scala:688)
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>> etFailed$1.apply(DAGScheduler.scala:688)
>>>> >> > at scala.Option.foreach(Option.scala:236)
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>>>> DAGScheduler.scala:688)
>>>> >> > at
>>>> >> >
>>>> >> > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$
>>>> anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>>>> >> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>> >> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>> >> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>> >> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>> >> > at
>>>> >> >
>>>> >> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
>>>> AbstractDispatcher.scala:386)
>>>> >> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.
>>>> java:260)
>>>> >> > at
>>>> >> >
>>>> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>>>> ForkJoinPool.java:1339)
>>>> >> > at
>>>> >> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo
>>>> l.java:1979)
>>>> >> > at
>>>> >> >
>>>> >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW
>>>> orkerThread.java:107)
>>>> >> >
>>>> >> >
>>>> >> > Cant figure out what the issue is.  I'm running in CDH5.2 w/
>>>> version of
>>>> >> > spark being 1.1.  The file i'm loading is literally just 7 MB.  I
>>>> >> > thought it
>>>> >> > was jar files mismatch, but i did a compare and see they're all
>>>> >> > identical.
>>>> >> > But seeing as how they were all installed through CDH parcels, not
>>>> sure
>>>> >> > how
>>>> >> > there would be version mismatch on the nodes and master.  Oh yeah 1
>>>> >> > master
>>>> >> > node w/ 2 worker nodes and running in standalone not through
>>>> yarn.  So
>>>> >> > as a
>>>> >> > just in case, i copied the jars from the master to the 2 worker
>>>> nodes as
>>>> >> > just in case, and still same issue.
>>>> >> > Weird thing is, first time i installed and tested it out, it
>>>> worked, but
>>>> >> > now
>>>> >> > it doesn't.
>>>> >> >
>>>> >> > Any help here would be greatly appreciated.
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Marcelo
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>
>

Re: spark-shell giving me error of unread block data

Reply via email to