On Wed, Sep 26, 2012 at 04:17PM, Konstantin Shvachko wrote:
> Hi Todd,
> 
> > I had said previously that it's worth
> > discussing if several other people believe the same.
> 
> Well let's put it on to general list for discussion then?
> Seems to me an important issue for  Hadoop evolution in general.
> We keep growing the HDFS umbrella with competing technologies
> (http/web HDFS as an example) within it.
> Which makes the project harder to stabilize and release.
> Not touching MR/Yarn here.
> 
> > If at some point in the future, the internal APIs have fully
> > stabilized (security, IPC, edit log streams, JournalManager, metrics,
> > etc) then we can pull it out at that time.
> 
> By that time it will monolithically grow into HDFS and vise versa.
> 
> > I know that we plan to ship it as part of CDH and will be our
> > recommended way of running HA HDFS.
> 
> Sounds like CDH is moving well in release plans and otherwise.
> My concern is that if we add another 6000 lines of code to Hadoop-2,
> it will take yet another x months for stabilization.
> While it is not clear why people cannot just use NFS filers for shared 
> storage,
> as you originally designed.
> 
> > distros. Moving it to an entirely separate standalone project will
> > just add extra work for these folks who, like us, think it's currently
> > the best option for HA log storage.
> 
> Don't know who these folks are. I see it as less work for HDFS community,
> because there is no need for porting and supporting this project in two or
> more different versions.

>From a pure integration perspective I also see such separation to be
beneficial, as the dependencies can be clearly defined and managed orthogonal
to the project's source code.

Regards,
  Cos

> 
> Thanks,
> --Konstantin
> 
> On Wed, Sep 26, 2012 at 10:50 AM, Todd Lipcon <t...@cloudera.com> wrote:
> > On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko
> > <shv.had...@gmail.com> wrote:
> >> I think this is a great work, Todd.
> >> And I think we should not merge it into trunk or other branches.
> >> As I suggested earlier on this list I think this should be spinned off
> >> as a separate project or a subproject.
> >>
> >> - The code is well detached as a self contained package.
> >
> > The addition is mostly self-contained, but it makes use of a bunch of
> > "private" parts of HDFS and Common:
> > - Reuses all of the Hadoop security infrastructure, IPC, metrics, etc
> > - Coupled to the JournalManager interface which is still evolving. In
> > fact there were several patches in trunk which were done during the
> > development of this project, specifically to make this API more
> > general. There's still some further work to be done in this area on
> > the generic interface -- eg support for upgrade/rollback.
> > - The functional tests make use of a bunch of "private" HDFS APIs as well.
> >
> >> - It is a logically stand-alone project that can be replaced by other
> >> technologies.
> >> - If it is a separate project then there is no need to port it to
> >> other versions. You can package it as a dependent jar.
> >
> > Per above, it's not that separate, because in order to build it, we
> > had to make a number of changes to core HDFS internal interfaces. It
> > currently couldn't be used to store anything except for NN logs. It
> > would be a nice extension to truly separate it out into a
> > content-agnostic quorum-based edit log, but today it actually uses the
> > existing edit log validation code to determine valid lengths, etc.
> >
> >> - Finally, it will be a good precedent of spinning new projects out of
> >> HDFS rather than bringing everything under HDFS umbrella.
> >>
> >> Todd, I had a feeling you were in favor of this direction?
> >
> > I'm not in favor of it - I had said previously that it's worth
> > discussing if several other people believe the same.
> >
> > I know that we plan to ship it as part of CDH and will be our
> > recommended way of running HA HDFS. If the community doesn't accept
> > the contribution, and prefers that we maintain it in a fork on github,
> > then it's worth hearing. But I imagine that many other community
> > members will want to either use or it ship it as part of their
> > distros. Moving it to an entirely separate standalone project will
> > just add extra work for these folks who, like us, think it's currently
> > the best option for HA log storage.
> >
> > If at some point in the future, the internal APIs have fully
> > stabilized (security, IPC, edit log streams, JournalManager, metrics,
> > etc) then we can pull it out at that time.
> >
> > -Todd
> >
> >> On Tue, Sep 25, 2012 at 4:58 PM, Eli Collins <e...@cloudera.com> wrote:
> >>> +1   Awesome work Todd.
> >>>
> >>> On Tue, Sep 25, 2012 at 4:02 PM, Todd Lipcon <t...@cloudera.com> wrote:
> >>>> Dear fellow HDFS developers,
> >>>>
> >>>> Per my email thread last week ("Heads up: merge for QJM branch soon"
> >>>> at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to
> >>>> propose merging the HDFS-3077 branch into trunk. The branch has been
> >>>> active since mid July and has stabilized significantly over the last
> >>>> two months. It has passed the full test suite, findbugs, and release
> >>>> audit, and I think it's ready to merge at this point.
> >>>>
> >>>> The branch has been fully developed using the standard
> >>>> 'review-then-commit' (RTC) policy, and the design is described in
> >>>> detail in a document attached to HDFS-3077 itself. The code itself has
> >>>> been contributed by me, Aaron, and Eli, but I'd be remiss not to also
> >>>> acknowledge the contributions to the design from discussions with
> >>>> Suresh, Sanjay, Henry Robinson, Patrick Hunt, Ivan Kelly, Andrew
> >>>> Purtell, Flavio Junqueira, Ben Reed, Nicholas, Bikas, Brandon, and
> >>>> others. Additionally, special thanks to Andrew Purtell and Stephen Chu
> >>>> for their help with cluster testing.
> >>>>
> >>>> This initial VOTE is to merge only into trunk, but, following the
> >>>> pattern of automatic failover, I expect to merge it into branch-2
> >>>> within a few weeks as well. The merge to branch-2 should be clean, as
> >>>> both I and Andrew Purtell have been testing on branch-2-derived
> >>>> codebases in addition to trunk.
> >>>>
> >>>> Please cast your vote by EOD Friday 9/29. Given that the branch has
> >>>> only had small changes in the last few weeks, and there was a "heads
> >>>> up" last week, I trust this should be enough time for committers to
> >>>> cast their votes. Per our by-laws, we need a minimum of three binding
> >>>> +1 votes from committers.
> >>>>
> >>>> I will start the voting with my own +1.
> >>>>
> >>>> Thanks
> >>>> -Todd
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera

Reply via email to