Hi Gerhard, How does EMR set its conf for spark? I think if you set SPARK_CLASSPATH and spark.dirver.extraClassPath, spark would ignore SPARK_CLASSPATH. I think you can do this by read the configuration from SparkConf, and then add your custom settings to the corresponding key, and use the updated SparkConf to instantiate your SparkContext.
Thanks, Daoyuan From: Gerhard Fiedler [mailto:gfied...@algebraixdata.com] Sent: Wednesday, March 09, 2016 5:41 AM To: user@spark.apache.org Subject: How to add a custom jar file to the Spark driver? We're running Spark 1.6.0 on EMR, in YARN client mode. We run Python code, but we want to add a custom jar file to the driver. When running on a local one-node standalone cluster, we just use spark.driver.extraClassPath and everything works: spark-submit --conf spark.driver.extraClassPath=/path/to/our/custom/jar/* our-python-script.py But on EMR, this value is set to something that is needed to make their installation of Spark work. Setting it to point to our custom jar overwrites the original setting rather than adding to it and breaks Spark. Our current workaround is to capture to whatever EMR sets spark.driver.extraClassPath once, then use that path and add our jar file to it. Of course this breaks when EMR changes this path in their cluster settings. We wouldn't necessarily notice this easily. This is how it looks: spark-submit --conf spark.driver.extraClassPath=/path/to/our/custom/jar/*:/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/* our-python-script.py We prefer not to do this... We tried the spark-submit argument --jars, but it didn't seem to do anything. Like this: spark-submit --jars /path/to/our/custom/jar/file.jar our-python-script.py We also tried to set CLASSPATH, but it doesn't seem to have any impact: export CLASSPATH=/path/to/our/custom/jar/* spark-submit our-python-script.py When using SPARK_CLASSPATH, we got warnings that it is deprecated, and the messages also seemed to imply that it affects the same configuration that is set by spark.driver.extraClassPath. So, my question is: Is there a clean way to add a custom jar file to a Spark configuration? Thanks, Gerhard