Re: Overwriting the same block instead of creating a new one

Vidur Goyal Tue, 22 Jun 2010 00:01:57 -0700

Like in any other filesystem as ext4 , in case of overwrite why don't we
update the existing physical memory. Why is there a need to allocate
memory every time when an overwrite takes place. Isn't this a overhead.



> I know about the current behaviour of HDFS. I am proposing this new
> behaviour which i mentioned in my first mail.
>
> In Hadoop-0.20.2 , a new block is allocated and stored at datanodes and a
> new INode is created in namespace. Why is an overwrite considered as a
> file creation operation.
>
> -vidur
>> Hi Vidur,
>>
>> I'm not following. The "overwrite" flag causes the file to be
>> overwritten
>> starting at offset 0 - it doesn't allow you to retain any bit of the
>> preexisting file. It's equivalent to a remove followed by a create.
>> Think
>> of
>> it like O_TRUNC.
>>
>> -Todd
>>
>> On Mon, Jun 21, 2010 at 10:03 PM, Vidur Goyal
>> <vi...@students.iiit.ac.in>wrote:
>>
>>> Dear Todd,
>>>
>>> By truncating i meant removing unused *blocks* from the namespace and
>>> let
>>> them be garbage collected. There will be no truncation of the last
>>> block(even if it is not full). This way , rather then garbage
>>> collecting
>>> all the blocks of a file , we will only be garbage collecting the
>>> remaining blocks.
>>>
>>> -vidur
>>>
>>>
>>> > HDFS assumes in hundreds of places that blocks never shrink. So,
>>> there
>>> is
>>> > no
>>> > option to truncate a block.
>>> >
>>> > -Todd
>>> >
>>> > On Mon, Jun 21, 2010 at 9:41 PM, Vidur Goyal
>>> > <vi...@students.iiit.ac.in>wrote:
>>> >
>>> >> Hi All,
>>> >>
>>> >> In FSNamesystem#startFileInternal , whenever there is a overwrite
>>> flag
>>> >> set
>>> >> , why is the INode removed from the namespace and a new
>>> >> INodeFileUnderConstruction is created. Why can't we use the convert
>>> the
>>> >> same INode to INodeFileUnderConstruction. And we start writing to
>>> the
>>> >> same
>>> >> blocks at the same datanodes (after incrementing the GS) followed by
>>> >> either truncating the remaining blocks(if the file size decreases)
>>> or
>>> >> allocating new blocks (if the file size increases). This will
>>> decrease
>>> >> data redundancy and the job of garbage collector and will increase
>>> >> security.
>>> >>
>>> >> vidur
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> This message has been scanned for viruses and
>>> >> dangerous content by MailScanner, and is
>>> >> believed to be clean.
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Todd Lipcon
>>> > Software Engineer, Cloudera
>>> >
>>> > --
>>> > This message has been scanned for viruses and
>>> > dangerous content by MailScanner, and is
>>> > believed to be clean.
>>> >
>>> >
>>>
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is
>>> believed to be clean.
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: Overwriting the same block instead of creating a new one

Reply via email to