Re: Running multiple Pig jobs simultaneously on same data

勇胡 Wed, 15 Jun 2011 05:26:48 -0700

I read the link, and I just felt that the HDFS is designed for the
read-frequently operation, not for the write-frequently( A file
once created, written, and closed need not be changed.) .


For your description (Immutable means that after creation it cannot be
modified.), if I understand correct, you mean that the HDFS can not
implement "update" semantics as same as in the database area? The write
operation can not directly apply to the specific tuple or record? The result
of write operation just appends at the end of the file.

Regards

Yong

2011/6/15 Nathan Bijnens <[email protected]>

> Immutable means that after creation it cannot be modified.
>
> HDFS applications need a write-once-read-many access model for files. A
> file
> once created, written, and closed need not be changed. This assumption
> simplifies data coherency issues and enables high throughput data access. A
> MapReduce application or a web crawler application fits perfectly with this
> model. There is a plan to support appending-writes to files in the future.
>
> http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#Simple+Coherency+Model
>
> Best regards,
>  Nathan
> ---
> [email protected] : http://nathan.gs : http://twitter.com/nathan_gs
>
>
> On Wed, Jun 15, 2011 at 12:58 PM, 勇胡 <[email protected]> wrote:
>
> > How can I understand immutable? I mean whether the HDFS implements lock
> > mechanism to obtain immutable data access when the concurrent tasks
> process
> > the same set of data or uses other strategy to implement immutable?
> >
> > Thanks
> >
> > Yong
> >
> > 2011/6/14 Bill Graham <[email protected]>
> >
> > > Yes, this is possible. Data in HDFS is immutable and MR tasks are
> spawned
> > > in
> > > their own VM so multiple concurrent jobs acting on the same input data
> > are
> > > fine.
> > >
> > > On Tue, Jun 14, 2011 at 11:18 AM, Pradipta Kumar Dutta <
> > > [email protected]> wrote:
> > >
> > > > Hi All,
> > > >
> > > > We have a requirement where we have to process same set of data (in
> > > Hadoop
> > > > cluster) by running multiple Pig jobs simultaneously.
> > > >
> > > > Any idea whether this is possible in Pig?
> > > >
> > > > Thanks,
> > > > Pradipta
> > > >
> > >
> >
>

Re: Running multiple Pig jobs simultaneously on same data

Reply via email to