Hi Dave,

Your opinion is very much appreciated.

Thanks,
--Konstantin

On Wed, Mar 21, 2012 at 5:36 AM, Dave Shine
<dave.sh...@channelintelligence.com> wrote:
> I am not a contributor to this project, so I don't know how much weight my 
> opinion carries.  But I have been hoping to see append become stable soon.  
> We are constantly dealing with the "small file problem", and I have written 
> M/R jobs to periodically roll up lots of small files into a few small ones.  
> Having append would prevent me from needing to use up cluster resources 
> performing these tasks.
>
> Therefore, all things being equal I +1 making append work.  However, if the 
> level of complexity is as bad as Eli implies below, then I can understand 
> that perhaps it is not worth the effort. If it will cause too much technical 
> debt, then removing it makes sense.  But don't just remove it because you 
> don't believe there is a need for it.
>
> Thanks,
> Dave Shine
>
>
> -----Original Message-----
> From: Eli Collins [mailto:e...@cloudera.com]
> Sent: Tuesday, March 20, 2012 8:38 PM
> To: hdfs-dev@hadoop.apache.org
> Subject: [DISCUSS] Remove append?
>
> Hey gang,
>
> I'd like to get people's thoughts on the following proposal. I think we 
> should consider removing append from HDFS.
>
> Where we are today.. append was added in the 0.17-19 releases
> (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality issues. 
> It and sync were re-designed, re-implemented, and shipped in
> 21.0 (HDFS-265). To my knowledge, there has been no real production use. 
> Anecdotally people who worked on branch-20-append have told me they think the 
> new trunk code is substantially less well-tested than the branch-20-append 
> code (at least for sync, append was never well tested). It has certainly 
> gotten way less pounding from HBase users.
> The design however, is much improved, and people think we can get hsync (and 
> append) stabilized in trunk (mostly testing and bug fixing).
>
> Rationale follows..
>
> Append does not seem to be an important requirement, hflush was. There has 
> not been much demand for append, from users or downstream projects. Because 
> Hadoop 1.x does not have a working append implementation (see HDFS-3120, the 
> branch-20-append work was focused on sync not getting append working) which 
> is not enabled by default and downstream projects will want to support Hadoop 
> 1.x releases for years, most will not introduce dependencies on append 
> anyway. This is not to say demand does not exist, just that if it does, it's 
> been much smaller than security, sync, HA, backwards compatbile RPC, etc. 
> This probably explains why, over 5 years after the original implementation 
> started, we don't have a stable release with append.
>
> Append introduces non-trivial design and code complexity, which is not worth 
> the cost if we don't have real users. Removing append means we have the 
> property that HDFS blocks, when finalized, are immutable.
> This significantly simplifies the design and code, which significantly 
> simplifies the implementation of other features like snapshots, HDFS-level 
> caching, dedupe, etc.
>
> The vast majority of the HDFS-265 effort is still leveraged w/o append. The 
> new data durability and read consistency behavior was the key part.
>
> GFS, which HDFS' design is based on, has append (and atomic record
> append) so obviously a workable design does not preclude append.
> However we also should not ape the GFS feature set simply because it exists. 
> I've had conversations with people who worked on GFS that regret adding 
> record append (see also http://queue.acm.org/detail.cfm?id=1594206). In 
> short, unless append is a real priority for our users I think we should focus 
> our energy elsewhere.
>
> Thanks,
> Eli
>
> The information contained in this email message is considered confidential 
> and proprietary to the sender and is intended solely for review and use by 
> the named recipient. Any unauthorized review, use or distribution is strictly 
> prohibited. If you have received this message in error, please advise the 
> sender by reply email and delete the message.

Reply via email to