Like in any other filesystem as ext4 , in case of overwrite why don't we update the existing physical memory. Why is there a need to allocate memory every time when an overwrite takes place. Isn't this a overhead.
> I know about the current behaviour of HDFS. I am proposing this new > behaviour which i mentioned in my first mail. > > In Hadoop-0.20.2 , a new block is allocated and stored at datanodes and a > new INode is created in namespace. Why is an overwrite considered as a > file creation operation. > > -vidur >> Hi Vidur, >> >> I'm not following. The "overwrite" flag causes the file to be >> overwritten >> starting at offset 0 - it doesn't allow you to retain any bit of the >> preexisting file. It's equivalent to a remove followed by a create. >> Think >> of >> it like O_TRUNC. >> >> -Todd >> >> On Mon, Jun 21, 2010 at 10:03 PM, Vidur Goyal >> <vi...@students.iiit.ac.in>wrote: >> >>> Dear Todd, >>> >>> By truncating i meant removing unused *blocks* from the namespace and >>> let >>> them be garbage collected. There will be no truncation of the last >>> block(even if it is not full). This way , rather then garbage >>> collecting >>> all the blocks of a file , we will only be garbage collecting the >>> remaining blocks. >>> >>> -vidur >>> >>> >>> > HDFS assumes in hundreds of places that blocks never shrink. So, >>> there >>> is >>> > no >>> > option to truncate a block. >>> > >>> > -Todd >>> > >>> > On Mon, Jun 21, 2010 at 9:41 PM, Vidur Goyal >>> > <vi...@students.iiit.ac.in>wrote: >>> > >>> >> Hi All, >>> >> >>> >> In FSNamesystem#startFileInternal , whenever there is a overwrite >>> flag >>> >> set >>> >> , why is the INode removed from the namespace and a new >>> >> INodeFileUnderConstruction is created. Why can't we use the convert >>> the >>> >> same INode to INodeFileUnderConstruction. And we start writing to >>> the >>> >> same >>> >> blocks at the same datanodes (after incrementing the GS) followed by >>> >> either truncating the remaining blocks(if the file size decreases) >>> or >>> >> allocating new blocks (if the file size increases). This will >>> decrease >>> >> data redundancy and the job of garbage collector and will increase >>> >> security. >>> >> >>> >> vidur >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> This message has been scanned for viruses and >>> >> dangerous content by MailScanner, and is >>> >> believed to be clean. >>> >> >>> >> >>> > >>> > >>> > -- >>> > Todd Lipcon >>> > Software Engineer, Cloudera >>> > >>> > -- >>> > This message has been scanned for viruses and >>> > dangerous content by MailScanner, and is >>> > believed to be clean. >>> > >>> > >>> >>> >>> -- >>> This message has been scanned for viruses and >>> dangerous content by MailScanner, and is >>> believed to be clean. >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.