Ah, can one NOT create an RDD of any arbitrary Serializable type? It looks like I might be getting bitten by the same "java.io.ObjectInputStream uses root class loader only" bugs mentioned in:
* http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-td3259.html * https://github.com/apache/spark/pull/181 * http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3c7f6aa9e820f55d4a96946a87e086ef4a4bcdf...@eagh-erfpmbx41.erf.thomson.com%3E * https://groups.google.com/forum/#!topic/spark-users/Q66UOeA2u-I On Thu, Sep 18, 2014 at 4:51 PM, Paul Wais <pw...@yelp.com> wrote: > Well, it looks like Spark is just not loading my code into the > driver/executors.... E.g.: > > List<String> foo = JavaRDD<MyMessage> bars.map( > new Function< MyMessage, String>() { > > { > System.err.println("classpath: " + > System.getProperty("java.class.path")); > > CodeSource src = > com.google.protobuf.GeneratedMessageLite.class.getProtectionDomain().getCodeSource(); > if (src2 != null) { > URL jar = src2.getLocation(); > System.err.println("aaacom.google.protobuf.GeneratedMessageLite > from jar: " + jar.toString()); > } > > @Override > public String call(MyMessage v1) throws Exception { > return v1.getString(); > } > }).collect(); > > prints: > classpath: > ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.1.jar:/opt/spark/lib/datanucleus-rdbms-3.2.1.jar:/opt/spark/lib/datanucleus-core-3.2.2.jar > com.google.protobuf.GeneratedMessageLite from jar: > file:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar > > I do see after those lines: > 14/09/18 23:28:09 INFO Executor: Adding > file:/tmp/spark-cc147338-183f-46f6-b698-5b897e808a08/uber.jar to class > loader > > > This is with: > > spart-submit --master local --class MyClass --jars uber.jar uber.jar > > > My uber.jar has protobuf 2.5; I expected GeneratedMessageLite would > come from there. I'm using spark 1.1 and hadoop 2.3; hadoop 2.3 > should use protobuf 2.5[1] and even shade it properly. I read claims > in this list that Spark shades protobuf correctly since 0.9.? and > looking thru the pom.xml on github it looks like Spark includes > protobuf 2.5 in the hadoop 2.3 profile. > > > I guess I'm still at "What's the deal with getting Spark to distribute > and load code from my jar correctly?" > > > [1] > http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3.0/hadoop-project/pom.xml > > On Thu, Sep 18, 2014 at 1:06 AM, Paul Wais <pw...@yelp.com> wrote: >> Dear List, >> >> I'm writing an application where I have RDDs of protobuf messages. >> When I run the app via bin/spar-submit with --master local >> --driver-class-path path/to/my/uber.jar, Spark is able to >> ser/deserialize the messages correctly. >> >> However, if I run WITHOUT --driver-class-path path/to/my/uber.jar or I >> try --master spark://my.master:7077 , then I run into errors that make >> it look like my protobuf message classes are not on the classpath: >> >> Exception in thread "main" org.apache.spark.SparkException: Job >> aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most >> recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost): >> java.lang.RuntimeException: Unable to find proto buffer class >> >> com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775) >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> java.lang.reflect.Method.invoke(Method.java:606) >> >> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) >> >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) >> ... >> >> Why do I need --driver-class-path in the local scenario? And how can >> I ensure my classes are on the classpath no matter how my app is >> submitted via bin/spark-submit (e.g. --master spark://my.master:7077 ) >> ? I've tried poking through the shell scripts and SparkSubmit.scala >> and unfortunately I haven't been able to grok exactly what Spark is >> doing with the remote/local JVMs. >> >> Cheers, >> -Paul --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org