Just curious: how about using scala to drive the workflow? I guess if you 
use other tools (oozie, etc) you lose the advantage of reading from RDD -- 
you have to read from HDFS.

Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan



From:   "k.tham" <kevins...@gmail.com>
To:     u...@spark.incubator.apache.org, 
Date:   07/10/2014 01:20 PM
Subject:        Recommended pipeline automation tool? Oozie?



I'm just wondering what's the general recommendation for data pipeline
automation.

Say, I want to run Spark Job A, then B, then invoke script C, then do D, 
and
if D fails, do E, and if Job A fails, send email F, etc...

It looks like Oozie might be the best choice. But I'd like some
advice/suggestions.

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Recommended-pipeline-automation-tool-Oozie-tp9319.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.


Reply via email to