We are building an application that involves chains of M/R jobs, most likely all will be written in Hive. We need to start a Hive job when one or more prerequisite data sets appear (defined in the Hive sense as a new partition having been populated with data) - OR- a particular time has been reached.
We know of two scheduling packages that appear to solve this problem: Oozie & Pentaho (to which my company has a license). Does anyone have actual experience using either of these (or something else) to schedule Hive jobs? William Kornfeld Baynote