Hi guys,

Thanks for responding.

Re SPARK_CLASSPATH (Daoyuan): I think you are right. We tried it, and that’s 
what the warning we got said.

Re SparkConf (Daoyuan): We need the custom jar in the driver code, so I don’t 
know how that would work.

Re EMR -u (Sonal): The documentation says that this is for EMR versions 2 and 
3; we’re using the current version (4.3.0). We’re also still in an exploratory 
phase where we use the UI to bring up the clusters. We’ll probably try it at 
some point, but it seems that the current CLI version 
(http://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html) 
doesn’t have a similar argument.

Gerhard

From: Sonal Goyal [mailto:sonalgoy...@gmail.com]
Sent: Wed, Mar 09, 2016 04:28
To: Wang, Daoyuan
Cc: Gerhard Fiedler; user@spark.apache.org
Subject: Re: How to add a custom jar file to the Spark driver?

Hi Gerhard,

I just stumbled upon some documentation on EMR - link below. Seems there is a 
-u option to add jars in S3 to your classpath, have you tried that ?

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-spark-configure.html


Best Regards,
Sonal
Founder, Nube Technologies<http://www.nubetech.co>
Reifier at Strata Hadoop World<https://www.youtube.com/watch?v=eD3LkpPQIgM>
Reifier at Spark Summit 
2015<https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/>




On Wed, Mar 9, 2016 at 11:50 AM, Wang, Daoyuan 
<daoyuan.w...@intel.com<mailto:daoyuan.w...@intel.com>> wrote:
Hi Gerhard,

How does EMR set its conf for spark? I think if you set SPARK_CLASSPATH and 
spark.dirver.extraClassPath, spark would ignore SPARK_CLASSPATH.
I think you can do this by read the configuration from SparkConf, and then add 
your custom settings to the corresponding key, and use the updated SparkConf to 
instantiate your SparkContext.

Thanks,
Daoyuan

From: Gerhard Fiedler 
[mailto:gfied...@algebraixdata.com<mailto:gfied...@algebraixdata.com>]
Sent: Wednesday, March 09, 2016 5:41 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: How to add a custom jar file to the Spark driver?

We’re running Spark 1.6.0 on EMR, in YARN client mode. We run Python code, but 
we want to add a custom jar file to the driver.

When running on a local one-node standalone cluster, we just use 
spark.driver.extraClassPath and everything works:

spark-submit --conf spark.driver.extraClassPath=/path/to/our/custom/jar/*  
our-python-script.py

But on EMR, this value is set to something that is needed to make their 
installation of Spark work. Setting it to point to our custom jar overwrites 
the original setting rather than adding to it and breaks Spark.

Our current workaround is to capture to whatever EMR sets 
spark.driver.extraClassPath once, then use that path and add our jar file to 
it. Of course this breaks when EMR changes this path in their cluster settings. 
We wouldn’t necessarily notice this easily. This is how it looks:

spark-submit --conf 
spark.driver.extraClassPath=/path/to/our/custom/jar/*:/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
  our-python-script.py

We prefer not to do this…

We tried the spark-submit argument --jars, but it didn’t seem to do anything. 
Like this:

spark-submit --jars /path/to/our/custom/jar/file.jar  our-python-script.py

We also tried to set CLASSPATH, but it doesn’t seem to have any impact:

export CLASSPATH=/path/to/our/custom/jar/*
spark-submit  our-python-script.py

When using SPARK_CLASSPATH, we got warnings that it is deprecated, and the 
messages also seemed to imply that it affects the same configuration that is 
set by spark.driver.extraClassPath.


So, my question is: Is there a clean way to add a custom jar file to a Spark 
configuration?

Thanks,
Gerhard


Reply via email to