Hi, This exception is caused by a missing jar on the classpath. The needed jars should be added to the classpath in Oozie action. This blogpost <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/>describes several ways to do it.
I've never tried to run a SparkR application from Oozie. I guess it can be done, but in the current state it need some manual work: According to Spark <https://github.com/apache/spark/tree/master/R>, the SparkR libraries should be under $SPARK_HOME/R/lib, and $R_HOME should be also set for the job. $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and you could add the SparkR stuff to Spark sharelib to make it available in the action. It's not guarantied that it will work after these steps, but there's a chance. I would be delighted to hear about the result if you have the time to try to make this work. Thanks, gp On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com> wrote: > Hi: > I have an issue with oozie run sparkR, could you please help me? > I try to run sparkR job through oozie in yarn-client mode. And I have > installed R package in all my nodes. > > job.properties is like: > nameNode=hdfs://XXX:8020 > jobTracker=XXX:8050 > master=yarn-client > queueName=default > oozie.use.system.libpath=true > oozie.wf.application.path=/user/oozie/measurecountWF > > The workflow is like: > <workflow-app xmlns='uri:oozie:workflow:0.5' name='measurecountWF'> > <global> > <configuration> > <property> > <name>oozie.launcher.yarn.app.mapreduce.am.env</name> > <value>SPARK_HOME=XXXX</value> > </property> > </configuration> > </global> > <start to="sparkAction"/> > <action name="sparkAction"> > <spark xmlns="uri:oozie:spark-action:0.1"> > <job-tracker>${jobTracker}</job-tracker> > <name-node>${nameNode}</name-node> > <master>${master}</master> > <name>measurecountWF</name> > <jar>measurecount.R</jar> > <spark-opts>--conf spark.driver.extraJavaOptions= > XXXX</spark-opts> > </spark> > <ok to="end"/> > <error to="fail"/> > </action> > <kill name="fail"> > <message>Workflow failed, error > message[${wf:errorMessage(wf:lastErrorNode())}] > </message> > </kill> > <end name="end"/> > </workflow-app> > > It failed with class not found exception. > > org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 > in stage 0.0 (TID 3, XXXX): java.lang.ClassNotFoundException: > com.cloudant.spark.common.JsonStoreRDDPartition > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.serializer.JavaDeserializationStream$$ > anon$1.resolveClass(JavaSerializer.scala:68) > at java.io.ObjectInputStream.readNonProxyDesc( > ObjectInputStream.java:1613) > at java.io.ObjectInputStream.readClassDesc( > ObjectInputStream.java:1518) > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1351) > at java.io.ObjectInputStream.defaultReadFields(ObjectInpu > Calls: sql -> callJMethod -> invokeJava > Execution halted > Intercepting System.exit(1) > > Does oozie support run sparkR in spark action? Or we should only wrap > it in ssh action? > > Thanks a lot > -- Peter Cseh Software Engineer <http://www.cloudera.com>