Hi everyone,

I'm seeing the below exception coming out of Spark 1.0.1 when I call it
from my application.  I can't share the source to that application, but the
quick gist is that it uses Spark's Java APIs to read from Avro files in
HDFS, do processing, and write back to Avro files.  It does this by
receiving a REST call, then spinning up a new JVM as the driver application
that connects to Spark.  I'm using CDH4.4.0 and have enabled Kryo and also
speculation.  The cluster is running in standalone mode on a 6 node cluster
in AWS (not using Spark's EC2 scripts though).

The below stacktraces are reliably reproduceable on every run of the job.
 The issue seems to be that on deserialization of a task result on the
driver, Kryo spits up while reading the ClassManifest.

I've tried swapping in Kryo 2.23.1 rather than 2.21 (2.22 had some
backcompat issues) but had the same error.

Any ideas on what can be done here?

Thanks!
Andrew



In the driver (Kryo exception while deserializing a DirectTaskResult):

INFO   | jvm 1    | 2014/07/30 20:52:52 | 20:52:52.667 [Result resolver
thread-0] ERROR o.a.spark.scheduler.TaskResultGetter - Exception while
getting task result
INFO   | jvm 1    | 2014/07/30 20:52:52 |
com.esotericsoftware.kryo.KryoException: Buffer underflow.
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
com.esotericsoftware.kryo.io.Input.require(Input.java:156)
~[kryo-2.21.jar:na]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
~[kryo-2.21.jar:na]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:762)
~[kryo-2.21.jar:na]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:624) ~[kryo-2.21.jar:na]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:26)
~[chill_2.10-0.3.6.jar:0.3.6]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
com.twitter.chill.ClassManifestSerializer.read(ClassManifestSerializer.scala:19)
~[chill_2.10-0.3.6.jar:0.3.6]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
~[kryo-2.21.jar:na]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:147)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316)
~[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
[spark-core_2.10-1.0.1.jar:1.0.1]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
INFO   | jvm 1    | 2014/07/30 20:52:52 |       at
java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]


In the DAGScheduler (job gets aborted):

org.apache.spark.SparkException: Job aborted due to stage failure:
Exception while getting task result:
com.esotericsoftware.kryo.KryoException: Buffer underflow.
    at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at scala.Option.foreach(Option.scala:236)
    at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
    at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


In an Executor (running tasks get killed):

14/07/29 22:57:38 INFO broadcast.HttpBroadcast: Started reading broadcast
variable 0
14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
153
14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
147
14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
141
14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
135
14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
150
14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
144
14/07/29 22:57:39 INFO executor.Executor: Executor is trying to kill task
138
14/07/29 22:57:39 INFO storage.MemoryStore: ensureFreeSpace(241733) called
with curMem=0, maxMem=30870601728
14/07/29 22:57:39 INFO storage.MemoryStore: Block broadcast_0 stored as
values to memory (estimated size 236.1 KB, free 28.8 GB)
14/07/29 22:57:39 INFO broadcast.HttpBroadcast: Reading broadcast variable
0 took 0.91790748 s
14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/29 22:57:39 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 135
org.apache.spark.TaskKilledException
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 144
org.apache.spark.TaskKilledException
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 150
org.apache.spark.TaskKilledException
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 138
org.apache.spark.TaskKilledException
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/07/29 22:57:40 ERROR executor.Executor: Exception in task ID 141
org.apache.spark.TaskKilledException
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Reply via email to