My experience is don't put any application specific settings into spark-defaults.conf which is applied to all applications.
Instead, you can either set them programmatically as what you did below or through spark-submit. Also, if you still like to do it via spark-defaults.conf, you will have to change that on all of your worker nodes when you go distributed one day. This is not scalable and not right either as you will have to put your app specific class path to all of your spark worker nodes' spark-defaults.conf ????????iPhone ------------------ ???????? ------------------ ??????: Todd Nist <[email protected]> ????????: 2015??05??24?? 02:14 ??????: yana.kadiyska <[email protected]> ????: [email protected] <[email protected]> ????: Re: spark.executor.extraClassPath - Values not picked up by executors Hi Yana, Yes typeo in the eamil, file name is correct "spark-defaults.conf"; thanks though. So it appears to work if in the driver is specify it as part of the sparkConf: val conf = new SparkConf().setAppName(getClass.getSimpleName) .set("spark.executor.extraClassPath", "/projects/spark-cassandra-connector/spark-cassandra-connetor/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar") I thought the spark-defaults would be applied regardless of weather it was a spark-submit (driver) or a custom driver as in my case, but apparently I am mistaken. This will work fine as I can ensure that all hosts participating in the cluster have access to a common directory with the dependencies and then just set the spark.executor.extraClassPath to "/some/shared/directory/lib/*.jar". If there is a better way to address this, let me know. As for the spark-cassandra-connector 1.3.0-SNAPSHOT, I am building that from master. Haven't hit any issue with it yet. -Todd On Fri, May 22, 2015 at 9:39 PM, Yana Kadiyska <[email protected]> wrote: Todd, I don't have any answers for you...other than the file is actually named spark-defaults.conf (not sure if you made a typo in the email or misnamed the file...). Do any other options from that file get read? I also wanted to ask if you built the spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar from trunk or if they published a 1.3 drop somewhere -- I'm just starting out with Cassandra and discovered https://datastax-oss.atlassian.net/browse/SPARKC-98 is still open... On Fri, May 22, 2015 at 6:15 PM, Todd Nist <[email protected]> wrote: I'm using the spark-cassandra-connector from DataStax in a spark streaming job launched from my own driver. It is connecting a a standalone cluster on my local box which has two worker running. This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have added the following entry to my $SPARK_HOME/conf/spark-default.conf: spark.executor.extraClassPath /projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes up just fine. As do the two workers with the following command: Worker 1, port 8081: radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://radtech.io:7077 --webui-port 8081 --cores 2 Worker 2, port 8082 radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://radtech.io:7077 --webui-port 8082 --cores 2 When I execute the Driver connecting the the master: sbt app/run -Dspark.master=spark://radtech.io:7077 It starts up, but when the executors are launched they do not include the entry in the spark.executor.extraClassPath: 15/05/22 17:35:26 INFO Worker: Asked to launch executor app-20150522173526-0000/0 for KillrWeatherApp$ 15/05/22 17:35:26 INFO ExecutorRunner: Launch command: "java" "-cp" "/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar" "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:55932/user/CoarseGrainedScheduler" "--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" "app-20150522173526-0000" "--worker-url" "akka.tcp://[email protected]:55923/user/Worker" which will then cause the executor to fail with a ClassNotFoundException, which I would expect: [WARN] [2015-05-22 17:38:18,035] [org.apache.spark.scheduler.TaskSetManager]: Lost task 0.0 in stage 2.0 (TID 23, 192.168.1.3): java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:344) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I also notice that some of the entires on the executor classpath are duplicated? This is a newly installed spark-1.3.1-bin-hadoop2.6 standalone cluster just to ensure I had nothing from testing in the way. I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will pick up the jar and append it fine. Any suggestions on what is going on here? Seems to just ignore whatever I have in the spark.executor.extraClassPath. Is there a different way to do this? TIA. -Todd
