Thanks for the response. Turns out that this post addressed the issue. http://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser We have some UDFs defined and the jar containing the class for these UDFs wasn’t in the dependent jars list. Unfortunately the actual error got masked by the one I sent below.
Jeff From: Shixiong Zhu Date: Sunday, September 6, 2015 at 9:02 AM To: Jeff Jones Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: ClassCastException in driver program Looks there are some circular references in SQL making the immutable List serialization fail in 2.11. In 2.11, Scala immutable List uses writeReplace()/readResolve() which don't play nicely with circular references. Here is an example to reproduce this issue in 2.11.6: class Foo extends Serializable { var l: Seq[Any] = null } import java.io._ val o = new ByteArrayOutputStream() val o1 = new ObjectOutputStream(o) val m = new Foo val n = List(1, m) m.l = n o1.writeObject(n) o1.close() val i = new ByteArrayInputStream(o.toByteArray) val i1 = new ObjectInputStream(i) i1.readObject() Could you provide the "explain" output? It would be helpful to find the circular references. Best Regards, Shixiong Zhu 2015-09-05 0:26 GMT+08:00 Jeff Jones <jjo...@adaptivebiotech.com<mailto:jjo...@adaptivebiotech.com>>: We are using Scala 2.11 for a driver program that is running Spark SQL queries in a standalone cluster. I’ve rebuilt Spark for Scala 2.11 using the instructions at http://spark.apache.org/docs/latest/building-spark.html. I’ve had to work through a few dependency conflict but all-in-all it seems to work for some simple Spark examples. I integrated the Spark SQL code into my application and I’m able to run using a local client, but when I switch over to the standalone cluster I get the following error. Any help tracking this down would be appreciated. This exception occurs during a DataFrame.collect() call. I’ve tried to use –Dsun.io.serialization.extendedDebugInfo=true to get more information but it didn’t provide anything more. [error] o.a.s.s.TaskSetManager - Task 0 in stage 1.0 failed 4 times; aborting job [error] c.a.i.c.Analyzer - Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 10.248.0.242): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.sql.execution.Project.projectList of type scala.collection.Seq in instance of org.apache.spark.sql.execution.Project at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(Unknown Source) at java.io.ObjectStreamClass.setObjFieldValues(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:477) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at java.io.ObjectStreamClass.invokeReadObject(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.defaultReadFields(Unknown Source) at java.io.ObjectInputStream.readSerialData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Thanks, Jeff This message (and any attachments) is intended only for the designated recipient(s). It may contain confidential or proprietary information, or have other limitations on use as indicated by the sender. If you are not a designated recipient, you may not review, use, copy or distribute this message. If you received this in error, please notify the sender by reply e-mail and delete this message. This message (and any attachments) is intended only for the designated recipient(s). It may contain confidential or proprietary information, or have other limitations on use as indicated by the sender. If you are not a designated recipient, you may not review, use, copy or distribute this message. If you received this in error, please notify the sender by reply e-mail and delete this message.