Hello, I am trying to deserialize some data encoded using proto buff from within Spark and am getting class-not-found exceptions. I have narrowed the program down to something very simple that shows the problem exactly (see 'The Program' below) and hopefully someone can tell me the easy fix :)
So the situation is I have some proto buff reports in /tmp/reports. I also have a Spark project with the below Scala code (under The Program) as well as a Java file defining SensorReports all in the same src sub-tree in my Spark project. Its built using sbt in the standard way. The Spark job reads in the reports from /tmp/reports and then prints them to the console. When I build and run my spark job with spark-submit everything works as expected and the reports are printed out. When I uncomment the 'XXX' variant in the Scala spark program and try to print the reports from within a Spark Context I get the class-not-found exceptions. I don't understand why. If I get this working then I will want to do more than just print the reports from within the Spark Context. My read of the documentation tells me that my spark job should have access to everything in the submitted jar and that jar includes the Java code generated by the proto buff library which defines SensorReports. This is the spark-submit invocation I use after building my job as an assembly with the sbt-assembly plugin: spark-submit --class com.rick.processors.NewReportProcessor --master local[*] ../../../analyzer/spark/target/scala-2.10/rick-processors-assembly-1.0.jar I have also tried adding the jar programmatically using sc.addJar but that does not help. I found a bug from July ( https://github.com/apache/spark/pull/181) that seems related but it went into Spark 1.2.0 (which is what I am currently using) so I don't think that's it. Any ideas? Thanks! The Program: ========== package com.rick.processors import java.io.File import java.nio.file.{Path, Files, FileSystems} import org.apache.spark.{SparkContext, SparkConf} import com.rick.reports.Reports.SensorReports object NewReportProcessor { private val sparkConf = new SparkConf().setAppName("ReportProcessor") private val sc = new SparkContext(sparkConf) def main(args: Array[String]) = { val protoBuffsBinary = localFileReports() val sensorReportsBundles = protoBuffsBinary.map(bundle => SensorReports.parseFrom(bundle)) // XXX: Printing from within the SparkContext throws class-not-found // exceptions, why? // sc.makeRDD(sensorReportsBundles).foreach((x: SensorReports) => println(x.toString)) sensorReportsBundles.foreach((x: SensorReports) => println(x.toString)) } private def localFileReports() = { val reportDir = new File("/tmp/reports") val reportFiles = reportDir.listFiles.filter(_.getName.endsWith(".report")) reportFiles.map(file => { val path = FileSystems.getDefault().getPath("/tmp/reports", file.getName()) Files.readAllBytes(path) }) } } The Class-not-found exceptions: ========================= Spark assembly has been built with Hive, including Datanucleus jars on classpath Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/02/23 17:35:03 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.241.128 instead (on interface eth0) 15/02/23 17:35:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/02/23 17:35:04 INFO SecurityManager: Changing view acls to: rick 15/02/23 17:35:04 INFO SecurityManager: Changing modify acls to: rick 15/02/23 17:35:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rick); users with modify permissions: Set(rick) 15/02/23 17:35:04 INFO Slf4jLogger: Slf4jLogger started 15/02/23 17:35:04 INFO Remoting: Starting remoting 15/02/23 17:35:04 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.241.128:38110] 15/02/23 17:35:04 INFO Utils: Successfully started service 'sparkDriver' on port 38110. 15/02/23 17:35:04 INFO SparkEnv: Registering MapOutputTracker 15/02/23 17:35:04 INFO SparkEnv: Registering BlockManagerMaster 15/02/23 17:35:04 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150223173504-b26c 15/02/23 17:35:04 INFO MemoryStore: MemoryStore started with capacity 267.3 MB 15/02/23 17:35:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/02/23 17:35:05 INFO HttpFileServer: HTTP File server directory is /tmp/spark-c77dbc9a-d626-4991-a9b7-f593acafbe64 15/02/23 17:35:05 INFO HttpServer: Starting HTTP Server 15/02/23 17:35:05 INFO Utils: Successfully started service 'HTTP file server' on port 50950. 15/02/23 17:35:05 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 15/02/23 17:35:05 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 15/02/23 17:35:05 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 15/02/23 17:35:06 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 15/02/23 17:35:06 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045. 15/02/23 17:35:06 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046. 15/02/23 17:35:06 INFO Utils: Successfully started service 'SparkUI' on port 4046. 15/02/23 17:35:06 INFO SparkUI: Started SparkUI at http://192.168.241.128:4046 15/02/23 17:35:06 INFO SparkContext: Added JAR file:/home/rick/go/src/rick/sparksprint/containers/tests/StreamingReports/../../../analyzer/spark/target/scala-2.10/rick-processors-assembly-1.0.jar at http://192.168.241.128:50950/jars/rick-processors-assembly-1.0.jar with timestamp 1424741706610 15/02/23 17:35:06 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.241.128:38110/user/HeartbeatReceiver 15/02/23 17:35:07 INFO NettyBlockTransferService: Server created on 57801 15/02/23 17:35:07 INFO BlockManagerMaster: Trying to register BlockManager 15/02/23 17:35:07 INFO BlockManagerMasterActor: Registering block manager localhost:57801 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 57801) 15/02/23 17:35:07 INFO BlockManagerMaster: Registered BlockManager 15/02/23 17:35:07 INFO SparkContext: Starting job: foreach at NewReportProcessor.scala:17 15/02/23 17:35:07 INFO DAGScheduler: Got job 0 (foreach at NewReportProcessor.scala:17) with 1 output partitions (allowLocal=false) 15/02/23 17:35:07 INFO DAGScheduler: Final stage: Stage 0(foreach at NewReportProcessor.scala:17) 15/02/23 17:35:07 INFO DAGScheduler: Parents of final stage: List() 15/02/23 17:35:07 INFO DAGScheduler: Missing parents: List() 15/02/23 17:35:07 INFO DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at makeRDD at NewReportProcessor.scala:17), which has no missing parents 15/02/23 17:35:07 INFO MemoryStore: ensureFreeSpace(1360) called with curMem=0, maxMem=280248975 15/02/23 17:35:07 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1360.0 B, free 267.3 MB) 15/02/23 17:35:07 INFO MemoryStore: ensureFreeSpace(1071) called with curMem=1360, maxMem=280248975 15/02/23 17:35:07 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1071.0 B, free 267.3 MB) 15/02/23 17:35:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:57801 (size: 1071.0 B, free: 267.3 MB) 15/02/23 17:35:07 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/02/23 17:35:07 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838 15/02/23 17:35:07 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (ParallelCollectionRDD[0] at makeRDD at NewReportProcessor.scala:17) 15/02/23 17:35:07 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/02/23 17:35:07 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 5587 bytes) 15/02/23 17:35:07 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/02/23 17:35:07 INFO Executor: Fetching http://192.168.241.128:50950/jars/rick-processors-assembly-1.0.jar with timestamp 1424741706610 15/02/23 17:35:08 INFO Utils: Fetching http://192.168.241.128:50950/jars/rick-processors-assembly-1.0.jar to /tmp/fetchFileTemp2793880583189398319.tmp 15/02/23 17:35:08 INFO Executor: Adding file:/tmp/spark-bdec3945-52d1-42bf-8b7a-30f14f492a42/rick-processors-assembly-1.0.jar to class loader 15/02/23 17:35:08 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.io.IOException: java.lang.RuntimeException: Unable to find proto buffer class at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:988) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Unable to find proto buffer class at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:74) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) ... 20 more Caused by: java.lang.ClassNotFoundException: com.rick.reports.Reports$SensorReports at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:768) ... 37 more 15/02/23 17:35:08 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: java.lang.RuntimeException: Unable to find proto buffer class at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:988) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Unable to find proto buffer class at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:74) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) ... 20 more Caused by: java.lang.ClassNotFoundException: com.rick.reports.Reports$SensorReports at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:768) ... 37 more 15/02/23 17:35:08 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job 15/02/23 17:35:08 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/02/23 17:35:08 INFO TaskSchedulerImpl: Cancelling stage 0 15/02/23 17:35:08 INFO DAGScheduler: Job 0 failed: foreach at NewReportProcessor.scala:17, took 0.644071 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: java.lang.RuntimeException: Unable to find proto buffer class at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:988) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Unable to find proto buffer class at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:74) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) ... 20 more Caused by: java.lang.ClassNotFoundException: com.rick.reports.Reports$SensorReports at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:768) ... 37 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) The Proper Report Output: ==================== reports { network_report { begin_time: 1424380054789676056 end_time: 1424380054789740740 source: "<some-IP-address>:80" destination: "<some-IP-address>:46792" protocol: TCP stream_id: 3 stream_status: 8 } http_report { request { method: "GET" etc...