Hi William, I have hands-on experience with Pentaho for Hadoop, that is the PDI (Pentaho Data Integration) module. There are components out there (called "steps") that can check whether a file is there (in HDFS or somewhere else). If the file is not there yet, you could check again every X minutes. The time based trigger is also possible.
Cheers Jasper 2011/11/30 Alejandro Abdelnur <t...@cloudera.com> > William, > > Oozie workflow jobs support Hive actions and Oozie coordinator jobs > support time/data activation of workflow jobs. > > Cheers. > > Alejandro > > On Tue, Nov 29, 2011 at 4:27 PM, William Kornfeld > <wkornf...@baynote.com>wrote: > >> We are building an application that involves chains of M/R jobs, most >> likely all will be written in Hive. We need to start a Hive job when one >> or more prerequisite data sets appear (defined in the Hive sense as a new >> partition having been populated with data) - OR- a particular time has been >> reached. >> >> We know of two scheduling packages that appear to solve this problem: >> Oozie & Pentaho (to which my company has a license). >> >> Does anyone have actual experience using either of these (or something >> else) to schedule Hive jobs? >> >> William Kornfeld >> Baynote >> >> > -- *Jasper Knulst* Consultant *|* Incentro Den Haag Gildeweg 5B 2632 BD Nootdorp The Netherlands *E:* jasper.knu...@incentro.com *T:* +31157640750 *M: *+31619667511 *W:* www.incentro.com [image: Logo Incentro]