Hey,

I think you are hitting OOZIE-2658
<https://issues.apache.org/jira/browse/OOZIE-2658> there.
If I understand correctly, backporting and recompiling this change into
your Oozie distribution is not an option.
There are two workarounds that I can think of:

1)
- run the job without the --driver-class-path option
- in the Launcher logs, look for the part where you can find the list of
arguments passed to spark-submit
<https://github.com/apache/oozie/blame/master/sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java#L249-L254>
.
-  add the value of spark.driver.extraClassPath into your workflow.xml and
append "/usr/hdp/current/hbase-client/conf" to it manually. (look out for
the OS-speciific separator)
- run the updated workflow.

This way the driver classpath will have every jar Spark needs to run with.
Keep in mind that after applying this workaround, you'll have to update
your workflow.xml if any of the jars included on that list changes its
name. Also, you'll have to repeat the process for every workflow, as every
workflow-specific jar have to be on that list.

2)
As the change only affects the Spark Sharelib
<https://github.com/apache/oozie/commit/e0016c93ad903bdee07fa63b9265382f1c6e3a62>,
recompile and create a new Spark Sharelib including the change.
- apply the change onto the sources of your Oozie distribution
- execute mvn install to create the necessary jars
- copy the spark sharelib from /user/oozie/share/lib/lib_timestamp/spark
into  /user/oozie/share/lib/lib_timestamp/sparkhbase
- overwrite the oozie-sharelib-spark-VERSION.jar with the one you've just
created (it will be under the sharelib/spark/target folder
- run oozie admin -sharelibupdate to make Oozie aware of the changes
- force the action to use the new sharelib by
passing oozie.action.sharelib.for.spark=sparkhbase in your configuration.
You may define this at action or workflow level.

The drawback to this approach that you'll have to create the new sharelib
manually after upgrading any of the jars. Also, when the upgrade to Oozie
4.3.0 happens, every workflow that uses the fixed sharelib has to be
updated by removing this property. This
<http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/>
is
a great blogpost about sharelib internals if you're interested in it.


I hope this helps,
gp

On Wed, Jan 4, 2017 at 9:10 PM, Jan Hentschel <
jan.hentsc...@ultratendency.com> wrote:

> Hi,
>
>
>
> I have an HDP 2.5 cluster with Kerberos enabled running Oozie 4.2, Spark
> 1.6.2 and HBase 1.1.2.
>
>
>
> In this cluster, I have a Spark job which writes to HBase which I want to
> schedule via Oozie. Due to Kerberos, I had to make changes to core-site.xml
> to get Spark and HBase play nice with each other, which I must pass to the
> driver and executors in spark-submit via
>
>
>
> --conf "spark.executor.extraClassPath=/usr/hdp/current/hbase-client/conf”
> --driver-class-path "/usr/hdp/current/hbase-client/conf"
>
>
>
> When I put this into the spark-opts tag of the Spark action the
> configuration files do not get picked up and the authentication against
> HBase does not work. I assume the reason why the files are not picked up is
> the distributed cache. The ShareLib folder for Spark contains a version of
> the hbase-site.xml. I also uploaded the core-site.xml to the ShareLib
> folder for Spark, but as soon as I do that the Launch Mapper for the Spark
> action fails. Reason could be that the core-site.xml conflicts with the
> Oozie configuration.
>
>
>
> Question: How can I pass the core-site.xml file to the Spark action?
>
>
>
> I saw a lot of work regarding the Spark action in Oozie 4.3, but updating
> to this version is currently not an option.
>
>
>
> Best, Jan
>
>
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>

Reply via email to