Hi: Spark R can be run in oozie spark action. I tried to run the simple spark R script under spark example folder, it is successful. After setup R envioriment in your cluster, only need to put spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder. Below is the workflow I use for yarn cluster mode. <action name="sparkAction"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>${master}</master> <name>sparkRtest</name> <jar>${nameNode}/user/oozie/sparkR/dataframe.R</jar> <spark-opts>--conf spark.driver.extraJavaOptions=XXXX</spark-opts> </spark> <ok to="end"/> <error to="fail"/> </action>
Thanks 2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>: > Hi Peter: > Thank you very much for your reply. > I will have a try and tell you the result. > > 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>: > >> Hi, >> >> This exception is caused by a missing jar on the classpath. >> The needed jars should be added to the classpath in Oozie action. This >> blogpost >> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel >> ib-in-apache-oozie-cdh-5/>describes >> several ways to do it. >> >> I've never tried to run a SparkR application from Oozie. I guess it can be >> done, but in the current state it need some manual work: >> >> According to Spark <https://github.com/apache/spark/tree/master/R>, the >> SparkR libraries should be under $SPARK_HOME/R/lib, and $R_HOME should be >> also set for the job. >> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and >> you could add the SparkR stuff to Spark sharelib to make it available in >> the action. >> It's not guarantied that it will work after these steps, but there's a >> chance. I would be delighted to hear about the result if you have the time >> to try to make this work. >> >> Thanks, >> gp >> >> >> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com> >> wrote: >> >> > Hi: >> > I have an issue with oozie run sparkR, could you please help me? >> > I try to run sparkR job through oozie in yarn-client mode. And I have >> > installed R package in all my nodes. >> > >> > job.properties is like: >> > nameNode=hdfs://XXX:8020 >> > jobTracker=XXX:8050 >> > master=yarn-client >> > queueName=default >> > oozie.use.system.libpath=true >> > oozie.wf.application.path=/user/oozie/measurecountWF >> > >> > The workflow is like: >> > <workflow-app xmlns='uri:oozie:workflow:0.5' name='measurecountWF'> >> > <global> >> > <configuration> >> > <property> >> > <name>oozie.launcher.yarn. >> app.mapreduce.am.env</name> >> > <value>SPARK_HOME=XXXX</value> >> > </property> >> > </configuration> >> > </global> >> > <start to="sparkAction"/> >> > <action name="sparkAction"> >> > <spark xmlns="uri:oozie:spark-action:0.1"> >> > <job-tracker>${jobTracker}</job-tracker> >> > <name-node>${nameNode}</name-node> >> > <master>${master}</master> >> > <name>measurecountWF</name> >> > <jar>measurecount.R</jar> >> > <spark-opts>--conf spark.driver.extraJavaOptions= >> > XXXX</spark-opts> >> > </spark> >> > <ok to="end"/> >> > <error to="fail"/> >> > </action> >> > <kill name="fail"> >> > <message>Workflow failed, error >> > message[${wf:errorMessage(wf:lastErrorNode())}] >> > </message> >> > </kill> >> > <end name="end"/> >> > </workflow-app> >> > >> > It failed with class not found exception. >> > >> > org.apache.spark.SparkException: Job aborted due to stage failure: >> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 >> > in stage 0.0 (TID 3, XXXX): java.lang.ClassNotFoundException: >> > com.cloudant.spark.common.JsonStoreRDDPartition >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> > at java.lang.Class.forName0(Native Method) >> > at java.lang.Class.forName(Class.java:348) >> > at org.apache.spark.serializer.JavaDeserializationStream$$ >> > anon$1.resolveClass(JavaSerializer.scala:68) >> > at java.io.ObjectInputStream.readNonProxyDesc( >> > ObjectInputStream.java:1613) >> > at java.io.ObjectInputStream.readClassDesc( >> > ObjectInputStream.java:1518) >> > at java.io.ObjectInputStream.readOrdinaryObject( >> > ObjectInputStream.java:1774) >> > at java.io.ObjectInputStream.readObject0(ObjectInputStream. >> > java:1351) >> > at java.io.ObjectInputStream.defaultReadFields(ObjectInpu >> > Calls: sql -> callJMethod -> invokeJava >> > Execution halted >> > Intercepting System.exit(1) >> > >> > Does oozie support run sparkR in spark action? Or we should only wrap >> > it in ssh action? >> > >> > Thanks a lot >> > >> >> >> >> -- >> Peter Cseh >> Software Engineer >> <http://www.cloudera.com> >> > >