Well, it looks like Spark is just not loading my code into the
driver/executors.... E.g.:
List<String> foo = JavaRDD<MyMessage> bars.map(
new Function< MyMessage, String>() {
{
System.err.println("classpath: " +
System.getProperty("java.class.path"));
CodeSource src =
com.google.protobuf.GeneratedMessageLite.class.getProtectionDomain().getCodeSource();
if (src2 != null) {
URL jar = src2.getLocation();
System.err.println("aaacom.google.protobuf.GeneratedMessageLite
from jar: " + jar.toString());
}
@Override
public String call(MyMessage v1) throws Exception {
return v1.getString();
}
}).collect();
prints:
classpath:
::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.1.jar:/opt/spark/lib/datanucleus-rdbms-3.2.1.jar:/opt/spark/lib/datanucleus-core-3.2.2.jar
com.google.protobuf.GeneratedMessageLite from jar:
file:/opt/spark/lib/spark-assembly-1.1.0-hadoop2.3.0.jar
I do see after those lines:
14/09/18 23:28:09 INFO Executor: Adding
file:/tmp/spark-cc147338-183f-46f6-b698-5b897e808a08/uber.jar to class
loader
This is with:
spart-submit --master local --class MyClass --jars uber.jar uber.jar
My uber.jar has protobuf 2.5; I expected GeneratedMessageLite would
come from there. I'm using spark 1.1 and hadoop 2.3; hadoop 2.3
should use protobuf 2.5[1] and even shade it properly. I read claims
in this list that Spark shades protobuf correctly since 0.9.? and
looking thru the pom.xml on github it looks like Spark includes
protobuf 2.5 in the hadoop 2.3 profile.
I guess I'm still at "What's the deal with getting Spark to distribute
and load code from my jar correctly?"
[1]
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3.0/hadoop-project/pom.xml
On Thu, Sep 18, 2014 at 1:06 AM, Paul Wais <[email protected]> wrote:
> Dear List,
>
> I'm writing an application where I have RDDs of protobuf messages.
> When I run the app via bin/spar-submit with --master local
> --driver-class-path path/to/my/uber.jar, Spark is able to
> ser/deserialize the messages correctly.
>
> However, if I run WITHOUT --driver-class-path path/to/my/uber.jar or I
> try --master spark://my.master:7077 , then I run into errors that make
> it look like my protobuf message classes are not on the classpath:
>
> Exception in thread "main" org.apache.spark.SparkException: Job
> aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most
> recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost):
> java.lang.RuntimeException: Unable to find proto buffer class
>
> com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
>
> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104)
>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807)
> ...
>
> Why do I need --driver-class-path in the local scenario? And how can
> I ensure my classes are on the classpath no matter how my app is
> submitted via bin/spark-submit (e.g. --master spark://my.master:7077 )
> ? I've tried poking through the shell scripts and SparkSubmit.scala
> and unfortunately I haven't been able to grok exactly what Spark is
> doing with the remote/local JVMs.
>
> Cheers,
> -Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]