Re: Hadoop scheduling question

Scott Carey Fri, 05 Jun 2009 12:19:45 -0700

Even more general context:
Cascading does something similar, but I am not sure if it uses Hadoop's
JobControl or manages dependencies itself.  It definitely runs multiple jobs
in parallel when the dependencies allow it.




On 6/5/09 11:44 AM, "Alan Gates" <[email protected]> wrote:

> To add a little context, Pig uses Hadoop's JobControl to schedule it's
> jobs.  Pig defines the dependencies between jobs in JobControl, and
> then submits the entire graph of jobs.  So, using JobControl, does
> Hadoop schedule jobs serially or in parallel (assuming no dependencies)?
> 
> Alan.
> 
> On Jun 5, 2009, at 10:50 AM, Kristi Morton wrote:
> 
>> Hi Pankil,
>> 
>> Sorry about having to send my question email twice to the list...
>> the first time I sent it I had forgotten to subscribe to the list.
>> I resent it after subscribing, and your response to the first email
>> I sent did not make it into my inbox.  I saw your response on the
>> archives list.
>> 
>> So, to recap, you said:
>> 
>> "We are not able to carry out all joins in a single job..we also
>> tried our hadoop code using
>> Pig scripts and found that for each join in PIG script new job is
>> used.So
>> basically what i think its a sequential process to handle typesof
>> join where
>> output of one job is required s an input to other one."
>> 
>> 
>> I, too, have seen this sequential behavior with joins.  However, it
>> seems like it could be possible for there to be two jobs executing
>> in parallel whose output is the input to the subsequent job.  Is
>> this possible or are all jobs scheduled sequentially?
>> 
>> Thanks,
>> Kristi
>> 
> 
>

Re: Hadoop scheduling question

Reply via email to