Immutable means that after creation it cannot be modified. HDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A MapReduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in the future. http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#Simple+Coherency+Model
Best regards, Nathan --- [email protected] : http://nathan.gs : http://twitter.com/nathan_gs On Wed, Jun 15, 2011 at 12:58 PM, 勇胡 <[email protected]> wrote: > How can I understand immutable? I mean whether the HDFS implements lock > mechanism to obtain immutable data access when the concurrent tasks process > the same set of data or uses other strategy to implement immutable? > > Thanks > > Yong > > 2011/6/14 Bill Graham <[email protected]> > > > Yes, this is possible. Data in HDFS is immutable and MR tasks are spawned > > in > > their own VM so multiple concurrent jobs acting on the same input data > are > > fine. > > > > On Tue, Jun 14, 2011 at 11:18 AM, Pradipta Kumar Dutta < > > [email protected]> wrote: > > > > > Hi All, > > > > > > We have a requirement where we have to process same set of data (in > > Hadoop > > > cluster) by running multiple Pig jobs simultaneously. > > > > > > Any idea whether this is possible in Pig? > > > > > > Thanks, > > > Pradipta > > > > > >
