Re: Running multiple Pig jobs simultaneously on same data

Nathan Bijnens Wed, 15 Jun 2011 04:12:08 -0700

Immutable means that after creation it cannot be modified.

HDFS applications need a write-once-read-many access model for files. A file
once created, written, and closed need not be changed. This assumption
simplifies data coherency issues and enables high throughput data access. A
MapReduce application or a web crawler application fits perfectly with this
model. There is a plan to support appending-writes to files in the future.
http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#Simple+Coherency+Model


Best regards,
  Nathan
---
[email protected] : http://nathan.gs : http://twitter.com/nathan_gs


On Wed, Jun 15, 2011 at 12:58 PM, 勇胡 <[email protected]> wrote:

> How can I understand immutable? I mean whether the HDFS implements lock
> mechanism to obtain immutable data access when the concurrent tasks process
> the same set of data or uses other strategy to implement immutable?
>
> Thanks
>
> Yong
>
> 2011/6/14 Bill Graham <[email protected]>
>
> > Yes, this is possible. Data in HDFS is immutable and MR tasks are spawned
> > in
> > their own VM so multiple concurrent jobs acting on the same input data
> are
> > fine.
> >
> > On Tue, Jun 14, 2011 at 11:18 AM, Pradipta Kumar Dutta <
> > [email protected]> wrote:
> >
> > > Hi All,
> > >
> > > We have a requirement where we have to process same set of data (in
> > Hadoop
> > > cluster) by running multiple Pig jobs simultaneously.
> > >
> > > Any idea whether this is possible in Pig?
> > >
> > > Thanks,
> > > Pradipta
> > >
> >
>

Re: Running multiple Pig jobs simultaneously on same data

Reply via email to