Re: spark.executor.extraClassPath - Values not picked up by executors

Yana Kadiyska Fri, 22 May 2015 18:40:11 -0700

Todd, I don't have any answers for you...other than the file is actually
named spark-defaults.conf (not sure if you made a typo in the email or
misnamed the file...). Do any other options from that file get read?


I also wanted to ask if you built the spark-cassandra-connector-assembly-1.3
.0-SNAPSHOT.jar from trunk or if they published a 1.3 drop somewhere -- I'm
just starting out with Cassandra and discovered
https://datastax-oss.atlassian.net/browse/SPARKC-98 is still open...

On Fri, May 22, 2015 at 6:15 PM, Todd Nist <tsind...@gmail.com> wrote:

> I'm using the spark-cassandra-connector from DataStax in a spark streaming
> job launched from my own driver.  It is connecting a a standalone cluster
> on my local box which has two worker running.
>
> This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT.  I have
> added the following entry to my $SPARK_HOME/conf/spark-default.conf:
>
> spark.executor.extraClassPath 
> /projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar
>
>
> When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes
> up just fine.  As do the two workers with the following command:
>
> Worker 1, port 8081:
>
> radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker 
> spark://radtech.io:7077 --webui-port 8081 --cores 2
>
> Worker 2, port 8082
>
> radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker 
> spark://radtech.io:7077 --webui-port 8082 --cores 2
>
> When I execute the Driver connecting the the master:
>
> sbt app/run -Dspark.master=spark://radtech.io:7077
>
> It starts up, but when the executors are launched they do not include the
> entry in the spark.executor.extraClassPath:
>
> 15/05/22 17:35:26 INFO Worker: Asked to launch executor 
> app-20150522173526-0000/0 for KillrWeatherApp$15/05/22 17:35:26 INFO 
> ExecutorRunner: Launch command: "java" "-cp" 
> "/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar"
>  "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" 
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
> "akka.tcp://sparkDriver@192.168.1.3:55932/user/CoarseGrainedScheduler" 
> "--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" 
> "app-20150522173526-0000" "--worker-url" 
> "akka.tcp://sparkWorker@192.168.1.3:55923/user/Worker"
>
>
>
> which will then cause the executor to fail with a ClassNotFoundException,
> which I would expect:
>
> [WARN] [2015-05-22 17:38:18,035] [org.apache.spark.scheduler.TaskSetManager]: 
> Lost task 0.0 in stage 2.0 (TID 23, 192.168.1.3): 
> java.lang.ClassNotFoundException: 
> com.datastax.spark.connector.rdd.partitioner.CassandraPartition
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:344)
>     at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
>     at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>     at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>     at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>     at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
>     at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
>
> I also notice that some of the entires on the executor classpath are
> duplicated?  This is a newly installed spark-1.3.1-bin-hadoop2.6
>  standalone cluster just to ensure I had nothing from testing in the way.
>
> I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will
> pick up the jar and append it fine.
>
> Any suggestions on what is going on here?  Seems to just ignore whatever I
> have in the spark.executor.extraClassPath.  Is there a different way to do
> this?
>
> TIA.
>
> -Todd
>
>
>
>

Re: spark.executor.extraClassPath - Values not picked up by executors

Reply via email to