Hi guys, Thanks for responding.
Re SPARK_CLASSPATH (Daoyuan): I think you are right. We tried it, and that’s what the warning we got said. Re SparkConf (Daoyuan): We need the custom jar in the driver code, so I don’t know how that would work. Re EMR -u (Sonal): The documentation says that this is for EMR versions 2 and 3; we’re using the current version (4.3.0). We’re also still in an exploratory phase where we use the UI to bring up the clusters. We’ll probably try it at some point, but it seems that the current CLI version (http://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html) doesn’t have a similar argument. Gerhard From: Sonal Goyal [mailto:sonalgoy...@gmail.com] Sent: Wed, Mar 09, 2016 04:28 To: Wang, Daoyuan Cc: Gerhard Fiedler; user@spark.apache.org Subject: Re: How to add a custom jar file to the Spark driver? Hi Gerhard, I just stumbled upon some documentation on EMR - link below. Seems there is a -u option to add jars in S3 to your classpath, have you tried that ? http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-spark-configure.html Best Regards, Sonal Founder, Nube Technologies<http://www.nubetech.co> Reifier at Strata Hadoop World<https://www.youtube.com/watch?v=eD3LkpPQIgM> Reifier at Spark Summit 2015<https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/> On Wed, Mar 9, 2016 at 11:50 AM, Wang, Daoyuan <daoyuan.w...@intel.com<mailto:daoyuan.w...@intel.com>> wrote: Hi Gerhard, How does EMR set its conf for spark? I think if you set SPARK_CLASSPATH and spark.dirver.extraClassPath, spark would ignore SPARK_CLASSPATH. I think you can do this by read the configuration from SparkConf, and then add your custom settings to the corresponding key, and use the updated SparkConf to instantiate your SparkContext. Thanks, Daoyuan From: Gerhard Fiedler [mailto:gfied...@algebraixdata.com<mailto:gfied...@algebraixdata.com>] Sent: Wednesday, March 09, 2016 5:41 AM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: How to add a custom jar file to the Spark driver? We’re running Spark 1.6.0 on EMR, in YARN client mode. We run Python code, but we want to add a custom jar file to the driver. When running on a local one-node standalone cluster, we just use spark.driver.extraClassPath and everything works: spark-submit --conf spark.driver.extraClassPath=/path/to/our/custom/jar/* our-python-script.py But on EMR, this value is set to something that is needed to make their installation of Spark work. Setting it to point to our custom jar overwrites the original setting rather than adding to it and breaks Spark. Our current workaround is to capture to whatever EMR sets spark.driver.extraClassPath once, then use that path and add our jar file to it. Of course this breaks when EMR changes this path in their cluster settings. We wouldn’t necessarily notice this easily. This is how it looks: spark-submit --conf spark.driver.extraClassPath=/path/to/our/custom/jar/*:/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/* our-python-script.py We prefer not to do this… We tried the spark-submit argument --jars, but it didn’t seem to do anything. Like this: spark-submit --jars /path/to/our/custom/jar/file.jar our-python-script.py We also tried to set CLASSPATH, but it doesn’t seem to have any impact: export CLASSPATH=/path/to/our/custom/jar/* spark-submit our-python-script.py When using SPARK_CLASSPATH, we got warnings that it is deprecated, and the messages also seemed to imply that it affects the same configuration that is set by spark.driver.extraClassPath. So, my question is: Is there a clean way to add a custom jar file to a Spark configuration? Thanks, Gerhard