Yong, Currently, HDFS does not support appending to a file. So once a file is created, it literally cannot be changed (although it can be deleted, I suppose). this lets you avoid issues where I do a SELECT * on the entire database, and the dba can't update a row, or other things like that. There are some append patches in the works but I am not sure how they handle the concurrency implications.
Make sense? Jon 2011/6/15 勇胡 <[email protected]> > I read the link, and I just felt that the HDFS is designed for the > read-frequently operation, not for the write-frequently( A file > once created, written, and closed need not be changed.) . > > For your description (Immutable means that after creation it cannot be > modified.), if I understand correct, you mean that the HDFS can not > implement "update" semantics as same as in the database area? The write > operation can not directly apply to the specific tuple or record? The > result > of write operation just appends at the end of the file. > > Regards > > Yong > > 2011/6/15 Nathan Bijnens <[email protected]> > > > Immutable means that after creation it cannot be modified. > > > > HDFS applications need a write-once-read-many access model for files. A > > file > > once created, written, and closed need not be changed. This assumption > > simplifies data coherency issues and enables high throughput data access. > A > > MapReduce application or a web crawler application fits perfectly with > this > > model. There is a plan to support appending-writes to files in the > future. > > > > > http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#Simple+Coherency+Model > > > > Best regards, > > Nathan > > --- > > [email protected] : http://nathan.gs : http://twitter.com/nathan_gs > > > > > > On Wed, Jun 15, 2011 at 12:58 PM, 勇胡 <[email protected]> wrote: > > > > > How can I understand immutable? I mean whether the HDFS implements lock > > > mechanism to obtain immutable data access when the concurrent tasks > > process > > > the same set of data or uses other strategy to implement immutable? > > > > > > Thanks > > > > > > Yong > > > > > > 2011/6/14 Bill Graham <[email protected]> > > > > > > > Yes, this is possible. Data in HDFS is immutable and MR tasks are > > spawned > > > > in > > > > their own VM so multiple concurrent jobs acting on the same input > data > > > are > > > > fine. > > > > > > > > On Tue, Jun 14, 2011 at 11:18 AM, Pradipta Kumar Dutta < > > > > [email protected]> wrote: > > > > > > > > > Hi All, > > > > > > > > > > We have a requirement where we have to process same set of data (in > > > > Hadoop > > > > > cluster) by running multiple Pig jobs simultaneously. > > > > > > > > > > Any idea whether this is possible in Pig? > > > > > > > > > > Thanks, > > > > > Pradipta > > > > > > > > > > > > > > >
