I never brought it up on the CDH list because I was told during my CDH training 
(Dec 2010) that is was already there.  When I later learned it was usable only 
for HBase, I just assumed it would be coming, eventually.

Dave


-----Original Message-----
From: Eli Collins [mailto:e...@cloudera.com] 
Sent: Wednesday, March 21, 2012 2:52 PM
To: hdfs-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Remove append?

Good point. I thought I'd start with devs first. If you can't get it past devs 
there's no reason to go further.

Also, users will tell you they want everything. I'd like to root cause this, eg 
if they want append to solve the small files problem I'd like to know if 
solving the latter means we don't have to do the former.

ps - fwiw the cdh-user@ mailing list has 800 people on it and it's rarely 
requested. Ditto in customer conversations. However the user base continues to 
grow rapidly and change in makeup so the past isn't necessarily a good 
predictor.

Thanks,
Eli

On Wed, Mar 21, 2012 at 11:31 AM, Tim Broberg <tim.brob...@exar.com> wrote:
> No specific advice on this particular issue, but in general, I learned the 
> hard way to stop asking the question, "Feature X is hard to support, is 
> anybody really going to use this?" *Every time* I have asked this question, I 
> get the answer I want to hear. *Every time*, they come back and ask for the 
> feature back later and it's more work than it would have been if I had just 
> planned for it from the beginning.
>
> YMMV, and I'm always asking marketing guys whereas you're asking developers.
>
> Ok, there's one piece of specific advice: Go find the people that will tell 
> you what you don't want to hear. Ask hdfs-user's whether they need the 
> feature rather than hdfs-dev's.
>
> We all have too much empathy for your position here to make you suffer.
>
>    - Tim.
>
> -----Original Message-----
> From: Eli Collins [mailto:e...@cloudera.com]
> Sent: Tuesday, March 20, 2012 8:38 PM
> To: hdfs-dev@hadoop.apache.org
> Subject: [DISCUSS] Remove append?
>
> Hey gang,
>
> I'd like to get people's thoughts on the following proposal. I think we 
> should consider removing append from HDFS.
>
> Where we are today.. append was added in the 0.17-19 releases
> (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality 
> issues. It and sync were re-designed, re-implemented, and shipped in
> 21.0 (HDFS-265). To my knowledge, there has been no real production use. 
> Anecdotally people who worked on branch-20-append have told me they think the 
> new trunk code is substantially less well-tested than the branch-20-append 
> code (at least for sync, append was never well tested). It has certainly 
> gotten way less pounding from HBase users.
> The design however, is much improved, and people think we can get hsync (and 
> append) stabilized in trunk (mostly testing and bug fixing).
>
> Rationale follows..
>
> Append does not seem to be an important requirement, hflush was. There has 
> not been much demand for append, from users or downstream projects. Because 
> Hadoop 1.x does not have a working append implementation (see HDFS-3120, the 
> branch-20-append work was focused on sync not getting append working) which 
> is not enabled by default and downstream projects will want to support Hadoop 
> 1.x releases for years, most will not introduce dependencies on append 
> anyway. This is not to say demand does not exist, just that if it does, it's 
> been much smaller than security, sync, HA, backwards compatbile RPC, etc. 
> This probably explains why, over 5 years after the original implementation 
> started, we don't have a stable release with append.
>
> Append introduces non-trivial design and code complexity, which is not worth 
> the cost if we don't have real users. Removing append means we have the 
> property that HDFS blocks, when finalized, are immutable.
> This significantly simplifies the design and code, which significantly 
> simplifies the implementation of other features like snapshots, HDFS-level 
> caching, dedupe, etc.
>
> The vast majority of the HDFS-265 effort is still leveraged w/o append. The 
> new data durability and read consistency behavior was the key part.
>
> GFS, which HDFS' design is based on, has append (and atomic record
> append) so obviously a workable design does not preclude append.
> However we also should not ape the GFS feature set simply because it exists. 
> I've had conversations with people who worked on GFS that regret adding 
> record append (see also http://queue.acm.org/detail.cfm?id=1594206). In 
> short, unless append is a real priority for our users I think we should focus 
> our energy elsewhere.
>
> Thanks,
> Eli
>
> The information contained in this email message is considered confidential 
> and proprietary to the sender and is intended solely for review and use by 
> the named recipient. Any unauthorized review, use or distribution is strictly 
> prohibited. If you have received this message in error, please advise the 
> sender by reply email and delete the message.
>
> The information and any attached documents contained in this message 
> may be confidential and/or legally privileged.  The message is 
> intended solely for the addressee(s).  If you are not the intended 
> recipient, you are hereby notified that any use, dissemination, or 
> reproduction is strictly prohibited and may be unlawful.  If you are 
> not the intended recipient, please contact the sender immediately by 
> return e-mail and destroy all copies of the original message.

Reply via email to