Re: spark.executor.extraClassPath - Values not picked up by executors

Todd Nist Sat, 23 May 2015 11:15:06 -0700

Hi Yana,

Yes typeo in the eamil, file name is correct "spark-defaults.conf"; thanks
though.  So it appears to work if in the driver is specify it as part of
the sparkConf:


val conf = new SparkConf().setAppName(getClass.getSimpleName)
  .set("spark.executor.extraClassPath",
"/projects/spark-cassandra-connector/spark-cassandra-connetor/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar"
)

I thought the spark-defaults would be applied regardless of weather it was
a spark-submit (driver) or a custom driver as in my case, but apparently I
am mistaken.  This will work fine as I can ensure that all hosts
participating in the cluster have access to a common directory with the
dependencies and then just set the spark.executor.extraClassPath to
"/some/shared/directory/lib/*.jar".

If there is a better way to address this, let me know.

As for the spark-cassandra-connector 1.3.0-SNAPSHOT, I am building that
from master.  Haven't hit any issue with it yet.

-Todd

On Fri, May 22, 2015 at 9:39 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
wrote:

> Todd, I don't have any answers for you...other than the file is actually
> named spark-defaults.conf (not sure if you made a typo in the email or
> misnamed the file...). Do any other options from that file get read?
>
> I also wanted to ask if you built the spark-cassandra-connector-assembly-
> 1.3.0-SNAPSHOT.jar from trunk or if they published a 1.3 drop somewhere
> -- I'm just starting out with Cassandra and discovered
> https://datastax-oss.atlassian.net/browse/SPARKC-98 is still open...
>
> On Fri, May 22, 2015 at 6:15 PM, Todd Nist <tsind...@gmail.com> wrote:
>
>> I'm using the spark-cassandra-connector from DataStax in a spark
>> streaming job launched from my own driver.  It is connecting a a standalone
>> cluster on my local box which has two worker running.
>>
>> This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT.  I have
>> added the following entry to my $SPARK_HOME/conf/spark-default.conf:
>>
>> spark.executor.extraClassPath 
>> /projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar
>>
>>
>> When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes
>> up just fine.  As do the two workers with the following command:
>>
>> Worker 1, port 8081:
>>
>> radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker 
>> spark://radtech.io:7077 --webui-port 8081 --cores 2
>>
>> Worker 2, port 8082
>>
>> radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker 
>> spark://radtech.io:7077 --webui-port 8082 --cores 2
>>
>> When I execute the Driver connecting the the master:
>>
>> sbt app/run -Dspark.master=spark://radtech.io:7077
>>
>> It starts up, but when the executors are launched they do not include the
>> entry in the spark.executor.extraClassPath:
>>
>> 15/05/22 17:35:26 INFO Worker: Asked to launch executor 
>> app-20150522173526-0000/0 for KillrWeatherApp$15/05/22 17:35:26 INFO 
>> ExecutorRunner: Launch command: "java" "-cp" 
>> "/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar"
>>  "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" 
>> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
>> "akka.tcp://sparkDriver@192.168.1.3:55932/user/CoarseGrainedScheduler" 
>> "--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" 
>> "app-20150522173526-0000" "--worker-url" 
>> "akka.tcp://sparkWorker@192.168.1.3:55923/user/Worker"
>>
>>
>>
>> which will then cause the executor to fail with a ClassNotFoundException,
>> which I would expect:
>>
>> [WARN] [2015-05-22 17:38:18,035] 
>> [org.apache.spark.scheduler.TaskSetManager]: Lost task 0.0 in stage 2.0 (TID 
>> 23, 192.168.1.3): java.lang.ClassNotFoundException: 
>> com.datastax.spark.connector.rdd.partitioner.CassandraPartition
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:344)
>>     at 
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
>>     at 
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>>     at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>>     at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>     at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
>>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
>>     at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>>     at 
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
>>     at 
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>     at java.lang.Thread.run(Thread.java:745)
>>
>> I also notice that some of the entires on the executor classpath are
>> duplicated?  This is a newly installed spark-1.3.1-bin-hadoop2.6
>>  standalone cluster just to ensure I had nothing from testing in the way.
>>
>> I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will
>> pick up the jar and append it fine.
>>
>> Any suggestions on what is going on here?  Seems to just ignore whatever
>> I have in the spark.executor.extraClassPath.  Is there a different way to
>> do this?
>>
>> TIA.
>>
>> -Todd
>>
>>
>>
>>
>

Re: spark.executor.extraClassPath - Values not picked up by executors

Reply via email to