My experience is don't put any application specific settings into 
spark-defaults.conf which is applied to all applications.


Instead, you can either set them programmatically as what you did below or 
through spark-submit.


Also, if you still like to do it via spark-defaults.conf, you will have to 
change that on all of your worker nodes when you go distributed one day. This 
is not scalable and not right either as you will have to put your app specific 
class path to all of your spark worker nodes' spark-defaults.conf

????????iPhone

------------------ ???????? ------------------
??????: Todd Nist <[email protected]>
????????: 2015??05??24?? 02:14
??????: yana.kadiyska <[email protected]>
????: [email protected] <[email protected]>
????: Re: spark.executor.extraClassPath - Values not picked up by executors



Hi Yana,

Yes typeo in the eamil, file name is correct "spark-defaults.conf"; thanks 
though.  So it appears to work if in the driver is specify it as part of the 
sparkConf:

 
val conf = new SparkConf().setAppName(getClass.getSimpleName) 
  .set("spark.executor.extraClassPath", 
"/projects/spark-cassandra-connector/spark-cassandra-connetor/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar")

I thought the spark-defaults would be applied regardless of weather it was a 
spark-submit (driver) or a custom driver as in my case, but apparently I am 
mistaken.  This will work fine as I can ensure that all hosts participating in 
the cluster have access to a common directory with the dependencies and then 
just set the spark.executor.extraClassPath to 
"/some/shared/directory/lib/*.jar".  

If there is a better way to address this, let me know.

As for the spark-cassandra-connector 1.3.0-SNAPSHOT, I am building that from 
master.  Haven't hit any issue with it yet.

-Todd



On Fri, May 22, 2015 at 9:39 PM, Yana Kadiyska <[email protected]> wrote:
Todd, I don't have any answers for you...other than the file is actually named 
spark-defaults.conf (not sure if you made a typo in the email or misnamed the 
file...). Do any other options from that file get read?

I also wanted to ask if you built the 
spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar from trunk or if they 
published a 1.3 drop somewhere -- I'm just starting out with Cassandra and 
discovered
https://datastax-oss.atlassian.net/browse/SPARKC-98 is still open...



On Fri, May 22, 2015 at 6:15 PM, Todd Nist <[email protected]> wrote:
I'm using the spark-cassandra-connector from DataStax in a spark streaming job 
launched from my own driver.  It is connecting a a standalone cluster on my 
local box which has two worker running.

This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT.  I have added 
the following entry to my $SPARK_HOME/conf/spark-default.conf:


spark.executor.extraClassPath 
/projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar



When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes up 
just fine.  As do the two workers with the following command:


Worker 1, port 8081:

radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker 
spark://radtech.io:7077 --webui-port 8081 --cores 2
Worker 2, port 8082

radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker 
spark://radtech.io:7077 --webui-port 8082 --cores 2

When I execute the Driver connecting the the master:


sbt app/run -Dspark.master=spark://radtech.io:7077

It starts up, but when the executors are launched they do not include the entry 
in the spark.executor.extraClassPath:


15/05/22 17:35:26 INFO Worker: Asked to launch executor 
app-20150522173526-0000/0 for KillrWeatherApp$ 15/05/22 17:35:26 INFO 
ExecutorRunner: Launch command: "java" "-cp" 
"/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar"
 "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" 
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
"akka.tcp://[email protected]:55932/user/CoarseGrainedScheduler" 
"--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" 
"app-20150522173526-0000" "--worker-url" 
"akka.tcp://[email protected]:55923/user/Worker"





which will then cause the executor to fail with a ClassNotFoundException, which 
I would expect:

[WARN] [2015-05-22 17:38:18,035] [org.apache.spark.scheduler.TaskSetManager]: 
Lost task 0.0 in stage 2.0 (TID 23, 192.168.1.3): 
java.lang.ClassNotFoundException: 
com.datastax.spark.connector.rdd.partitioner.CassandraPartition     at 
java.net.URLClassLoader$1.run(URLClassLoader.java:372)     at 
java.net.URLClassLoader$1.run(URLClassLoader.java:361)     at 
java.security.AccessController.doPrivileged(Native Method)     at 
java.net.URLClassLoader.findClass(URLClassLoader.java:360)     at 
java.lang.ClassLoader.loadClass(ClassLoader.java:424)     at 
java.lang.ClassLoader.loadClass(ClassLoader.java:357)     at 
java.lang.Class.forName0(Native Method)     at 
java.lang.Class.forName(Class.java:344)     at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
     at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) 
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)     
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)    
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)     at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)     at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)     at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)     
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)     at 
java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)     at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
     at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745)

I also notice that some of the entires on the executor classpath are 
duplicated?  This is a newly installed spark-1.3.1-bin-hadoop2.6  standalone 
cluster just to ensure I had nothing from testing in the way.


I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will pick 
up the jar and append it fine.  


Any suggestions on what is going on here?  Seems to just ignore whatever I have 
in the spark.executor.extraClassPath.  Is there a different way to do this? 


TIA.


-Todd

Reply via email to