On Wed, Sep 26, 2012 at 04:17PM, Konstantin Shvachko wrote: > Hi Todd, > > > I had said previously that it's worth > > discussing if several other people believe the same. > > Well let's put it on to general list for discussion then? > Seems to me an important issue for Hadoop evolution in general. > We keep growing the HDFS umbrella with competing technologies > (http/web HDFS as an example) within it. > Which makes the project harder to stabilize and release. > Not touching MR/Yarn here. > > > If at some point in the future, the internal APIs have fully > > stabilized (security, IPC, edit log streams, JournalManager, metrics, > > etc) then we can pull it out at that time. > > By that time it will monolithically grow into HDFS and vise versa. > > > I know that we plan to ship it as part of CDH and will be our > > recommended way of running HA HDFS. > > Sounds like CDH is moving well in release plans and otherwise. > My concern is that if we add another 6000 lines of code to Hadoop-2, > it will take yet another x months for stabilization. > While it is not clear why people cannot just use NFS filers for shared > storage, > as you originally designed. > > > distros. Moving it to an entirely separate standalone project will > > just add extra work for these folks who, like us, think it's currently > > the best option for HA log storage. > > Don't know who these folks are. I see it as less work for HDFS community, > because there is no need for porting and supporting this project in two or > more different versions.
>From a pure integration perspective I also see such separation to be beneficial, as the dependencies can be clearly defined and managed orthogonal to the project's source code. Regards, Cos > > Thanks, > --Konstantin > > On Wed, Sep 26, 2012 at 10:50 AM, Todd Lipcon <t...@cloudera.com> wrote: > > On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko > > <shv.had...@gmail.com> wrote: > >> I think this is a great work, Todd. > >> And I think we should not merge it into trunk or other branches. > >> As I suggested earlier on this list I think this should be spinned off > >> as a separate project or a subproject. > >> > >> - The code is well detached as a self contained package. > > > > The addition is mostly self-contained, but it makes use of a bunch of > > "private" parts of HDFS and Common: > > - Reuses all of the Hadoop security infrastructure, IPC, metrics, etc > > - Coupled to the JournalManager interface which is still evolving. In > > fact there were several patches in trunk which were done during the > > development of this project, specifically to make this API more > > general. There's still some further work to be done in this area on > > the generic interface -- eg support for upgrade/rollback. > > - The functional tests make use of a bunch of "private" HDFS APIs as well. > > > >> - It is a logically stand-alone project that can be replaced by other > >> technologies. > >> - If it is a separate project then there is no need to port it to > >> other versions. You can package it as a dependent jar. > > > > Per above, it's not that separate, because in order to build it, we > > had to make a number of changes to core HDFS internal interfaces. It > > currently couldn't be used to store anything except for NN logs. It > > would be a nice extension to truly separate it out into a > > content-agnostic quorum-based edit log, but today it actually uses the > > existing edit log validation code to determine valid lengths, etc. > > > >> - Finally, it will be a good precedent of spinning new projects out of > >> HDFS rather than bringing everything under HDFS umbrella. > >> > >> Todd, I had a feeling you were in favor of this direction? > > > > I'm not in favor of it - I had said previously that it's worth > > discussing if several other people believe the same. > > > > I know that we plan to ship it as part of CDH and will be our > > recommended way of running HA HDFS. If the community doesn't accept > > the contribution, and prefers that we maintain it in a fork on github, > > then it's worth hearing. But I imagine that many other community > > members will want to either use or it ship it as part of their > > distros. Moving it to an entirely separate standalone project will > > just add extra work for these folks who, like us, think it's currently > > the best option for HA log storage. > > > > If at some point in the future, the internal APIs have fully > > stabilized (security, IPC, edit log streams, JournalManager, metrics, > > etc) then we can pull it out at that time. > > > > -Todd > > > >> On Tue, Sep 25, 2012 at 4:58 PM, Eli Collins <e...@cloudera.com> wrote: > >>> +1 Awesome work Todd. > >>> > >>> On Tue, Sep 25, 2012 at 4:02 PM, Todd Lipcon <t...@cloudera.com> wrote: > >>>> Dear fellow HDFS developers, > >>>> > >>>> Per my email thread last week ("Heads up: merge for QJM branch soon" > >>>> at http://markmail.org/message/vkyh5culdsuxdb6t) I would like to > >>>> propose merging the HDFS-3077 branch into trunk. The branch has been > >>>> active since mid July and has stabilized significantly over the last > >>>> two months. It has passed the full test suite, findbugs, and release > >>>> audit, and I think it's ready to merge at this point. > >>>> > >>>> The branch has been fully developed using the standard > >>>> 'review-then-commit' (RTC) policy, and the design is described in > >>>> detail in a document attached to HDFS-3077 itself. The code itself has > >>>> been contributed by me, Aaron, and Eli, but I'd be remiss not to also > >>>> acknowledge the contributions to the design from discussions with > >>>> Suresh, Sanjay, Henry Robinson, Patrick Hunt, Ivan Kelly, Andrew > >>>> Purtell, Flavio Junqueira, Ben Reed, Nicholas, Bikas, Brandon, and > >>>> others. Additionally, special thanks to Andrew Purtell and Stephen Chu > >>>> for their help with cluster testing. > >>>> > >>>> This initial VOTE is to merge only into trunk, but, following the > >>>> pattern of automatic failover, I expect to merge it into branch-2 > >>>> within a few weeks as well. The merge to branch-2 should be clean, as > >>>> both I and Andrew Purtell have been testing on branch-2-derived > >>>> codebases in addition to trunk. > >>>> > >>>> Please cast your vote by EOD Friday 9/29. Given that the branch has > >>>> only had small changes in the last few weeks, and there was a "heads > >>>> up" last week, I trust this should be enough time for committers to > >>>> cast their votes. Per our by-laws, we need a minimum of three binding > >>>> +1 votes from committers. > >>>> > >>>> I will start the voting with my own +1. > >>>> > >>>> Thanks > >>>> -Todd > >>>> -- > >>>> Todd Lipcon > >>>> Software Engineer, Cloudera > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera