Hi Yana, Yes typeo in the eamil, file name is correct "spark-defaults.conf"; thanks though. So it appears to work if in the driver is specify it as part of the sparkConf:
val conf = new SparkConf().setAppName(getClass.getSimpleName) .set("spark.executor.extraClassPath", "/projects/spark-cassandra-connector/spark-cassandra-connetor/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar" ) I thought the spark-defaults would be applied regardless of weather it was a spark-submit (driver) or a custom driver as in my case, but apparently I am mistaken. This will work fine as I can ensure that all hosts participating in the cluster have access to a common directory with the dependencies and then just set the spark.executor.extraClassPath to "/some/shared/directory/lib/*.jar". If there is a better way to address this, let me know. As for the spark-cassandra-connector 1.3.0-SNAPSHOT, I am building that from master. Haven't hit any issue with it yet. -Todd On Fri, May 22, 2015 at 9:39 PM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > Todd, I don't have any answers for you...other than the file is actually > named spark-defaults.conf (not sure if you made a typo in the email or > misnamed the file...). Do any other options from that file get read? > > I also wanted to ask if you built the spark-cassandra-connector-assembly- > 1.3.0-SNAPSHOT.jar from trunk or if they published a 1.3 drop somewhere > -- I'm just starting out with Cassandra and discovered > https://datastax-oss.atlassian.net/browse/SPARKC-98 is still open... > > On Fri, May 22, 2015 at 6:15 PM, Todd Nist <tsind...@gmail.com> wrote: > >> I'm using the spark-cassandra-connector from DataStax in a spark >> streaming job launched from my own driver. It is connecting a a standalone >> cluster on my local box which has two worker running. >> >> This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have >> added the following entry to my $SPARK_HOME/conf/spark-default.conf: >> >> spark.executor.extraClassPath >> /projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar >> >> >> When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes >> up just fine. As do the two workers with the following command: >> >> Worker 1, port 8081: >> >> radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker >> spark://radtech.io:7077 --webui-port 8081 --cores 2 >> >> Worker 2, port 8082 >> >> radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker >> spark://radtech.io:7077 --webui-port 8082 --cores 2 >> >> When I execute the Driver connecting the the master: >> >> sbt app/run -Dspark.master=spark://radtech.io:7077 >> >> It starts up, but when the executors are launched they do not include the >> entry in the spark.executor.extraClassPath: >> >> 15/05/22 17:35:26 INFO Worker: Asked to launch executor >> app-20150522173526-0000/0 for KillrWeatherApp$15/05/22 17:35:26 INFO >> ExecutorRunner: Launch command: "java" "-cp" >> "/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar" >> "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" >> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" >> "akka.tcp://sparkDriver@192.168.1.3:55932/user/CoarseGrainedScheduler" >> "--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" >> "app-20150522173526-0000" "--worker-url" >> "akka.tcp://sparkWorker@192.168.1.3:55923/user/Worker" >> >> >> >> which will then cause the executor to fail with a ClassNotFoundException, >> which I would expect: >> >> [WARN] [2015-05-22 17:38:18,035] >> [org.apache.spark.scheduler.TaskSetManager]: Lost task 0.0 in stage 2.0 (TID >> 23, 192.168.1.3): java.lang.ClassNotFoundException: >> com.datastax.spark.connector.rdd.partitioner.CassandraPartition >> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:344) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65) >> at >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> I also notice that some of the entires on the executor classpath are >> duplicated? This is a newly installed spark-1.3.1-bin-hadoop2.6 >> standalone cluster just to ensure I had nothing from testing in the way. >> >> I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will >> pick up the jar and append it fine. >> >> Any suggestions on what is going on here? Seems to just ignore whatever >> I have in the spark.executor.extraClassPath. Is there a different way to >> do this? >> >> TIA. >> >> -Todd >> >> >> >> >