Even more general context: Cascading does something similar, but I am not sure if it uses Hadoop's JobControl or manages dependencies itself. It definitely runs multiple jobs in parallel when the dependencies allow it.
On 6/5/09 11:44 AM, "Alan Gates" <[email protected]> wrote: > To add a little context, Pig uses Hadoop's JobControl to schedule it's > jobs. Pig defines the dependencies between jobs in JobControl, and > then submits the entire graph of jobs. So, using JobControl, does > Hadoop schedule jobs serially or in parallel (assuming no dependencies)? > > Alan. > > On Jun 5, 2009, at 10:50 AM, Kristi Morton wrote: > >> Hi Pankil, >> >> Sorry about having to send my question email twice to the list... >> the first time I sent it I had forgotten to subscribe to the list. >> I resent it after subscribing, and your response to the first email >> I sent did not make it into my inbox. I saw your response on the >> archives list. >> >> So, to recap, you said: >> >> "We are not able to carry out all joins in a single job..we also >> tried our hadoop code using >> Pig scripts and found that for each join in PIG script new job is >> used.So >> basically what i think its a sequential process to handle typesof >> join where >> output of one job is required s an input to other one." >> >> >> I, too, have seen this sequential behavior with joins. However, it >> seems like it could be possible for there to be two jobs executing >> in parallel whose output is the input to the subsequent job. Is >> this possible or are all jobs scheduled sequentially? >> >> Thanks, >> Kristi >> > >
