Todd, I don't have any answers for you...other than the file is actually named spark-defaults.conf (not sure if you made a typo in the email or misnamed the file...). Do any other options from that file get read?
I also wanted to ask if you built the spark-cassandra-connector-assembly-1.3 .0-SNAPSHOT.jar from trunk or if they published a 1.3 drop somewhere -- I'm just starting out with Cassandra and discovered https://datastax-oss.atlassian.net/browse/SPARKC-98 is still open... On Fri, May 22, 2015 at 6:15 PM, Todd Nist <tsind...@gmail.com> wrote: > I'm using the spark-cassandra-connector from DataStax in a spark streaming > job launched from my own driver. It is connecting a a standalone cluster > on my local box which has two worker running. > > This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have > added the following entry to my $SPARK_HOME/conf/spark-default.conf: > > spark.executor.extraClassPath > /projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar > > > When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes > up just fine. As do the two workers with the following command: > > Worker 1, port 8081: > > radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker > spark://radtech.io:7077 --webui-port 8081 --cores 2 > > Worker 2, port 8082 > > radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker > spark://radtech.io:7077 --webui-port 8082 --cores 2 > > When I execute the Driver connecting the the master: > > sbt app/run -Dspark.master=spark://radtech.io:7077 > > It starts up, but when the executors are launched they do not include the > entry in the spark.executor.extraClassPath: > > 15/05/22 17:35:26 INFO Worker: Asked to launch executor > app-20150522173526-0000/0 for KillrWeatherApp$15/05/22 17:35:26 INFO > ExecutorRunner: Launch command: "java" "-cp" > "/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar" > "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" > "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" > "akka.tcp://sparkDriver@192.168.1.3:55932/user/CoarseGrainedScheduler" > "--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" > "app-20150522173526-0000" "--worker-url" > "akka.tcp://sparkWorker@192.168.1.3:55923/user/Worker" > > > > which will then cause the executor to fail with a ClassNotFoundException, > which I would expect: > > [WARN] [2015-05-22 17:38:18,035] [org.apache.spark.scheduler.TaskSetManager]: > Lost task 0.0 in stage 2.0 (TID 23, 192.168.1.3): > java.lang.ClassNotFoundException: > com.datastax.spark.connector.rdd.partitioner.CassandraPartition > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:344) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > I also notice that some of the entires on the executor classpath are > duplicated? This is a newly installed spark-1.3.1-bin-hadoop2.6 > standalone cluster just to ensure I had nothing from testing in the way. > > I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will > pick up the jar and append it fine. > > Any suggestions on what is going on here? Seems to just ignore whatever I > have in the spark.executor.extraClassPath. Is there a different way to do > this? > > TIA. > > -Todd > > > >