Hi,

This exception is caused by a missing jar on the classpath.
The needed jars  should be added to the classpath in Oozie action. This
blogpost
<http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/>describes
several ways to do it.

I've never tried to run a SparkR application from Oozie. I guess it can be
done, but in the current state it need some manual work:

According to Spark <https://github.com/apache/spark/tree/master/R>, the
SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should be
also set for the job.
$SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and
you could add the SparkR stuff to Spark sharelib to make it available in
the action.
It's not guarantied that it will work after these steps, but there's a
chance. I would be delighted to hear about the result if you have the time
to try to make this work.

Thanks,
gp


On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
wrote:

> Hi:
> I have an issue with oozie run sparkR, could you please help me?
> I try to run sparkR job through oozie in yarn-client mode. And I have
> installed R package in all my nodes.
>
> job.properties is like:
> nameNode=hdfs://XXX:8020
> jobTracker=XXX:8050
> master=yarn-client
> queueName=default
> oozie.use.system.libpath=true
> oozie.wf.application.path=/user/oozie/measurecountWF
>
> The workflow is like:
> <workflow-app xmlns='uri:oozie:workflow:0.5' name='measurecountWF'>
> <global>
>             <configuration>
>                 <property>
>                     <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
>                     <value>SPARK_HOME=XXXX</value>
>                 </property>
>             </configuration>
> </global>
> <start to="sparkAction"/>
>     <action name="sparkAction">
>         <spark xmlns="uri:oozie:spark-action:0.1">
>                 <job-tracker>${jobTracker}</job-tracker>
>                 <name-node>${nameNode}</name-node>
>                 <master>${master}</master>
>                 <name>measurecountWF</name>
>                 <jar>measurecount.R</jar>
>          <spark-opts>--conf spark.driver.extraJavaOptions=
> XXXX</spark-opts>
>          </spark>
> <ok to="end"/>
>       <error to="fail"/>
>   </action>
>   <kill name="fail">
>         <message>Workflow failed, error
>         message[${wf:errorMessage(wf:lastErrorNode())}]
>         </message>
>   </kill>
>   <end name="end"/>
> </workflow-app>
>
> It failed with class not found exception.
>
> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
> in stage 0.0 (TID 3, XXXX): java.lang.ClassNotFoundException:
> com.cloudant.spark.common.JsonStoreRDDPartition
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:348)
>         at org.apache.spark.serializer.JavaDeserializationStream$$
> anon$1.resolveClass(JavaSerializer.scala:68)
>         at java.io.ObjectInputStream.readNonProxyDesc(
> ObjectInputStream.java:1613)
>         at java.io.ObjectInputStream.readClassDesc(
> ObjectInputStream.java:1518)
>         at java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1774)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1351)
>         at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
> Calls: sql -> callJMethod -> invokeJava
> Execution halted
> Intercepting System.exit(1)
>
> Does oozie support run sparkR in spark action? Or we should only wrap
> it in ssh action?
>
> Thanks a lot
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>

Reply via email to