Re: Unable to find proto buffer class error with RDD

Paul Wais Thu, 18 Sep 2014 17:22:09 -0700

Ah, can one NOT create an RDD of any arbitrary Serializable type?  It
looks like I might be getting bitten by the same
"java.io.ObjectInputStream uses root class loader only" bugs mentioned
in:


* 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-td3259.html
* https://github.com/apache/spark/pull/181

* 
http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3c7f6aa9e820f55d4a96946a87e086ef4a4bcdf...@eagh-erfpmbx41.erf.thomson.com%3E
* https://groups.google.com/forum/#!topic/spark-users/Q66UOeA2u-I




On Thu, Sep 18, 2014 at 4:51 PM, Paul Wais <pw...@yelp.com> wrote:
> Well, it looks like Spark is just not loading my code into the
> driver/executors.... E.g.:
>
> List<String> foo = JavaRDD<MyMessage> bars.map(
>     new Function< MyMessage, String>() {
>
>     {
>         System.err.println("classpath: " +
> System.getProperty("java.class.path"));
>
>         CodeSource src =
> com.google.protobuf.GeneratedMessageLite.class.getProtectionDomain().getCodeSource();
>         if (src2 != null) {
>            URL jar = src2.getLocation();
>            System.err.println("aaacom.google.protobuf.GeneratedMessageLite
> from jar: " + jar.toString());
>     }
>
>     @Override
>     public String call(MyMessage v1) throws Exception {
>         return v1.getString();
>     }
> }).collect();
>
> prints:
> classpath: 
> ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.1.jar:/opt/spark/lib/datanucleus-rdbms-3.2.1.jar:/opt/spark/lib/datanucleus-core-3.2.2.jar
> com.google.protobuf.GeneratedMessageLite from jar:
> file:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar
>
> I do see after those lines:
> 14/09/18 23:28:09 INFO Executor: Adding
> file:/tmp/spark-cc147338-183f-46f6-b698-5b897e808a08/uber.jar to class
> loader
>
>
> This is with:
>
> spart-submit --master local --class MyClass --jars uber.jar  uber.jar
>
>
> My uber.jar has protobuf 2.5; I expected GeneratedMessageLite would
> come from there.  I'm using spark 1.1 and hadoop 2.3; hadoop 2.3
> should use protobuf 2.5[1] and even shade it properly.  I read claims
> in this list that Spark shades protobuf correctly since 0.9.? and
> looking thru the pom.xml on github it looks like Spark includes
> protobuf 2.5 in the hadoop 2.3 profile.
>
>
> I guess I'm still at "What's the deal with getting Spark to distribute
> and load code from my jar correctly?"
>
>
> [1] 
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3.0/hadoop-project/pom.xml
>
> On Thu, Sep 18, 2014 at 1:06 AM, Paul Wais <pw...@yelp.com> wrote:
>> Dear List,
>>
>> I'm writing an application where I have RDDs of protobuf messages.
>> When I run the app via bin/spar-submit with --master local
>> --driver-class-path path/to/my/uber.jar, Spark is able to
>> ser/deserialize the messages correctly.
>>
>> However, if I run WITHOUT --driver-class-path path/to/my/uber.jar or I
>> try --master spark://my.master:7077 , then I run into errors that make
>> it look like my protobuf message classes are not on the classpath:
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job
>> aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most
>> recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost):
>> java.lang.RuntimeException: Unable to find proto buffer class
>>         
>> com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775)
>>         sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>         
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         java.lang.reflect.Method.invoke(Method.java:606)
>>         
>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104)
>>         
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807)
>>         ...
>>
>> Why do I need --driver-class-path in the local scenario?  And how can
>> I ensure my classes are on the classpath no matter how my app is
>> submitted via bin/spark-submit (e.g. --master spark://my.master:7077 )
>> ?  I've tried poking through the shell scripts and SparkSubmit.scala
>> and unfortunately I haven't been able to grok exactly what Spark is
>> doing with the remote/local JVMs.
>>
>> Cheers,
>> -Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Unable to find proto buffer class error with RDD

Reply via email to