Hi:
Spark R can be run in oozie spark action. I tried to run the simple spark R
script under spark example folder, it is successful.
After setup R envioriment in your cluster, only need to put
spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder.
Below is the workflow I use for yarn cluster mode.
<action name="sparkAction">
        <spark xmlns="uri:oozie:spark-action:0.1">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <master>${master}</master>
                <name>sparkRtest</name>
                <jar>${nameNode}/user/oozie/sparkR/dataframe.R</jar>
         <spark-opts>--conf spark.driver.extraJavaOptions=XXXX</spark-opts>
         </spark>
      <ok to="end"/>
      <error to="fail"/>
  </action>

Thanks


2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>:

> Hi Peter:
> Thank you very much for your reply.
> I will have a try and tell you the result.
>
> 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>:
>
>> Hi,
>>
>> This exception is caused by a missing jar on the classpath.
>> The needed jars  should be added to the classpath in Oozie action. This
>> blogpost
>> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel
>> ib-in-apache-oozie-cdh-5/>describes
>> several ways to do it.
>>
>> I've never tried to run a SparkR application from Oozie. I guess it can be
>> done, but in the current state it need some manual work:
>>
>> According to Spark <https://github.com/apache/spark/tree/master/R>, the
>> SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should be
>> also set for the job.
>> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and
>> you could add the SparkR stuff to Spark sharelib to make it available in
>> the action.
>> It's not guarantied that it will work after these steps, but there's a
>> chance. I would be delighted to hear about the result if you have the time
>> to try to make this work.
>>
>> Thanks,
>> gp
>>
>>
>> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
>> wrote:
>>
>> > Hi:
>> > I have an issue with oozie run sparkR, could you please help me?
>> > I try to run sparkR job through oozie in yarn-client mode. And I have
>> > installed R package in all my nodes.
>> >
>> > job.properties is like:
>> > nameNode=hdfs://XXX:8020
>> > jobTracker=XXX:8050
>> > master=yarn-client
>> > queueName=default
>> > oozie.use.system.libpath=true
>> > oozie.wf.application.path=/user/oozie/measurecountWF
>> >
>> > The workflow is like:
>> > <workflow-app xmlns='uri:oozie:workflow:0.5' name='measurecountWF'>
>> > <global>
>> >             <configuration>
>> >                 <property>
>> >                     <name>oozie.launcher.yarn.
>> app.mapreduce.am.env</name>
>> >                     <value>SPARK_HOME=XXXX</value>
>> >                 </property>
>> >             </configuration>
>> > </global>
>> > <start to="sparkAction"/>
>> >     <action name="sparkAction">
>> >         <spark xmlns="uri:oozie:spark-action:0.1">
>> >                 <job-tracker>${jobTracker}</job-tracker>
>> >                 <name-node>${nameNode}</name-node>
>> >                 <master>${master}</master>
>> >                 <name>measurecountWF</name>
>> >                 <jar>measurecount.R</jar>
>> >          <spark-opts>--conf spark.driver.extraJavaOptions=
>> > XXXX</spark-opts>
>> >          </spark>
>> > <ok to="end"/>
>> >       <error to="fail"/>
>> >   </action>
>> >   <kill name="fail">
>> >         <message>Workflow failed, error
>> >         message[${wf:errorMessage(wf:lastErrorNode())}]
>> >         </message>
>> >   </kill>
>> >   <end name="end"/>
>> > </workflow-app>
>> >
>> > It failed with class not found exception.
>> >
>> > org.apache.spark.SparkException: Job aborted due to stage failure:
>> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
>> > in stage 0.0 (TID 3, XXXX): java.lang.ClassNotFoundException:
>> > com.cloudant.spark.common.JsonStoreRDDPartition
>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >         at java.lang.Class.forName0(Native Method)
>> >         at java.lang.Class.forName(Class.java:348)
>> >         at org.apache.spark.serializer.JavaDeserializationStream$$
>> > anon$1.resolveClass(JavaSerializer.scala:68)
>> >         at java.io.ObjectInputStream.readNonProxyDesc(
>> > ObjectInputStream.java:1613)
>> >         at java.io.ObjectInputStream.readClassDesc(
>> > ObjectInputStream.java:1518)
>> >         at java.io.ObjectInputStream.readOrdinaryObject(
>> > ObjectInputStream.java:1774)
>> >         at java.io.ObjectInputStream.readObject0(ObjectInputStream.
>> > java:1351)
>> >         at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
>> > Calls: sql -> callJMethod -> invokeJava
>> > Execution halted
>> > Intercepting System.exit(1)
>> >
>> > Does oozie support run sparkR in spark action? Or we should only wrap
>> > it in ssh action?
>> >
>> > Thanks a lot
>> >
>>
>>
>>
>> --
>> Peter Cseh
>> Software Engineer
>> <http://www.cloudera.com>
>>
>
>

Reply via email to