Alan, I think your proposal sounds great. --travis
On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates <ga...@hortonworks.com> wrote: > Carl, speaking just for myself and not as a representative of the HCat PPMC > at this point, I am coming to agree with you that HCat integrating with Hive > fully makes more sense. > > However, this makes the committer question even thornier. Travis and Namit, > I think the shepherd proposal needs to lay out a clear and time bounded path > to committership for HCat committers. Having HCat committers as second class > Hive citizens for the long run will not be healthy. I propose the following > as a starting point for discussion: > > All active HCat committers (those who have contributed or committed a patch > in the last 6 months) will be made committers in the HCat portion only of > Hive. In addition those committers will be assigned a particular shepherd > who is a current Hive committer and who will be responsible for mentoring > them towards full Hive committership. As a part of this mentorship the HCat > committer will review patches of other contributors, contribute patches to > Hive (both inside and outside of HCatalog), respond to user issues on the > mailing lists, etc. It is intended that as a result of this mentorship > program HCat committers can become full Hive committers in 6-9 months. No > new HCat only committers will be elected in Hive after this. All Hive > committers will automatically also have commit rights on HCatalog. > > Alan. > > On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote: > >> On a functional level I don't think there is going to be much of a >> difference between the subproject option proposed by Travis and the other >> option where HCatalog becomes a TLP. In both cases HCatalog and Hive will >> have separate committers, separate code repositories, separate release >> cycles, and separate project roadmaps. Aside from ASF bureaucracy, I think >> the only major difference between the two options is that the subproject >> route will give the rest of the community the false impression that the two >> projects have coordinated roadmaps and a process to prevent overlapping >> functionality from appearing in both projects. Consequently, If these are >> the only two options then I would prefer that HCatalog become a TLP. >> >> On the other hand, I also agree with many of the sentiments that have >> already been expressed in this thread, namely that the two projects are >> closely related and that it would benefit the community at large if the two >> projects could be brought closer together. Up to this point the major >> source of pain for the HCatalog team has been the frequent necessity of >> making changes on both the Hive and HCatalog sides when implementing new >> features in HCatalog. This situation is compounded by the ASF requirement >> that release artifacts may not depend on snapshot artifacts from other ASF >> projects. Furthermore, if Hive adds a dependency on HCatalog then it will >> be subject to these same problems (in addition to the gross circular >> dependency!). >> >> I think the best way to avoid these problems is for HCatalog to become a >> Hive submodule. In this scenario HCatalog would exist as a subdirectory in >> the Hive repository and would be distributed as a Hive artifact in future >> Hive releases. In addition to solving the problems I mentioned earlier, I >> think this would also help to assuage the concerns of many Hive committers >> who don't want to see the MetaStore split out into a separate project. >> >> Thanks. >> >> Carl >> >> On Thu, Dec 13, 2012 at 7:59 PM, Namit Jain <nj...@fb.com> wrote: >> >>> I am fine with this. Any hive committers who wants to volunteer to be >>> a hcat shepherd is welcome. >>> >>> >>> >>> On 12/14/12 7:01 AM, "Travis Crawford" <traviscrawf...@gmail.com> wrote: >>> >>>> Thanks for reviving this thread. Reviewing the comments everyone seems >>>> to agree HCatalog makes sense as a Hive subproject. I think that's >>>> great news for the Hadoop community. >>>> >>>> The discussion seems to have turned to one of committer permissions. I >>>> agree with the Hive folks sentiment that its something that must be >>>> earned. That said, I've found it challenging at times getting patches >>>> into Hive that would help earn taking on a hive committer >>>> responsibility. >>>> >>>> Proposal: if a couple hive committers can volunteer to be hcat >>>> shepherds, we can work with the shepherds when making hive changes in >>>> a timely manor. Conversely, we can help shepherd any hive committers >>>> who are interested in working more with hcat. There are certainly >>>> benefits to cross-committership, and this approach could help each >>>> other build a history of meaningful contributions and earn the >>>> privilege & responsibility of being committers. >>>> >>>> Thoughts? >>>> >>>> --travis >>>> >>>> >>>> >>>> On Thu, Dec 13, 2012 at 11:59 AM, Edward Capriolo <edlinuxg...@gmail.com> >>>> wrote: >>>>> I initially was a hesitant of hcatalog mostly because I imagined we >>>>> would >>>>> end up in a spot very similar to this. >>>>> >>>>> Namely the hcatlog folks are interested in making a metastore to support >>>>> pig, hive, and map reduce. However I get the impression that many in >>>>> hive >>>>> do not care much to have a metastore that caters to everyone. Their >>>>> needs >>>>> are only based on what hive needs. Which I believe is the wrong way to >>>>> look >>>>> at this situation. >>>>> >>>>> I though to reply to this thread because I have been following this >>>>> Jira: >>>>> https://issues.apache.org/jira/browse/HIVE-3752 >>>>> >>>>> On a high level I do not like this duplication of effort and code. If >>>>> hive >>>>> is compatible with hcatalog I do not see why we put off merging the two >>>>> at >>>>> all. Hive users would get an immediate benefit if Hive used hcatalog >>>>> with >>>>> no apparent downside. Meanwhile we are putting this off and staying in >>>>> this >>>>> awkward transition phase. >>>>> >>>>> Personally, I do not have a problem being a hive committer and not >>>>> having >>>>> hcatalog commit. None of the hive work I have done has ever touched the >>>>> metastore. Also of the thousands of jiras and features we have added >>>>> only a >>>>> small portion require metastore changes. >>>>> >>>>> As long as a couple active users have commit on hive and the suggested >>>>> hcatalog subproject I do not think not having commit will be a >>>>> roadblock in >>>>> moving hive forward. >>>>> >>>>> >>>>> On Mon, Dec 3, 2012 at 6:22 PM, Alan Gates <ga...@hortonworks.com> >>>>> wrote: >>>>> >>>>>> I am not sure where we are on this discussion. So far those who have >>>>>> chimed in seemed generally positive (Namit, Edward, Clark, Alexander). >>>>>> Namit and I have different visions for what the committership might >>>>>> look >>>>>> like, so I'd like to hear from other Hive PMC members what their view >>>>>> is on >>>>>> this. I have to say from an HCatalog perspective the proposition is >>>>>> much >>>>>> less attractive without some commit rights. >>>>>> >>>>>> On a related note, people should be aware of these threads in the >>>>>> Incubator list: >>>>>> >>>>>> >>>>>> >>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/% >>>>>> 3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w% >>> 40mail.gmail.com >>>>>> %3E >>>>>> >>>>>> >>>>>> >>>>>> >>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/% >>>>>> 3CCAKQbXgDZj_zMj4qSodXjMHV7xQZxpcY1-35cvq959YKLNd6tJQ%40mail.gmail.com >>> %3 >>>>>> E >>>>>> >>>>>> For those not inclined to read all the mails in the threads I will >>>>>> summarize (though I urge all PMC members of Hive and PPMC members of >>>>>> HCat >>>>>> to read both mail threads because this is highly relevant to what we >>>>>> are >>>>>> discussing). There are two salient points in these threads: >>>>>> >>>>>> 1) It is not wise to build a subproject that is distinct from the main >>>>>> project in the sense that it has separate community members interested >>>>>> in >>>>>> it. Bertrand, Arun, Chris Mattman, and Greg Stein all spoke against >>>>>> this, >>>>>> and all are long time Apache contributors with a lot of experience. >>>>>> They >>>>>> were all of the opinion that it was reasonable for one project to >>>>>> release >>>>>> separate products. >>>>>> >>>>>> 2) It is not wise to have committers that have access to parts of a >>>>>> project but not others. Greg and Bertrand argued (and Arun seemed to >>>>>> imply) that splitting up committer lists by sections of the code did >>>>>> not >>>>>> work out well. >>>>>> >>>>>> These insights cause me to question what we mean by subproject. I had >>>>>> originally envisioned something that looked like Pig and Hive did when >>>>>> they >>>>>> were subprojects of Hadoop. But this violates both 1 and 2 above. >>>>>> Given >>>>>> this input from many of the "wise old timers" of Apache I think we >>>>>> should >>>>>> consider what we mean when we say subproject and how tightly we are >>>>>> willing >>>>>> to integrate these projects. Personally I think it makes sense to >>>>>> continue >>>>>> to pursue integration, as I think HCat is really a set of interfaces >>>>>> on top >>>>>> of Hive and it makes sense to coalesce those into one project. I guess >>>>>> this would mean HCat becomes just another set of jars that Hive >>>>>> releases >>>>>> when it releases, rather than a stand alone entity. But I'm curious to >>>>>> hear what others think. >>>>>> >>>>>> Alan. >>>>>> >>>>>> On Nov 14, 2012, at 10:22 PM, Namit Jain wrote: >>>>>> >>>>>>> The same criteria should be applied to all Hive committers. Only a >>>>>>> committer should be able to commit code. >>>>>>> I don¹t think we should bend this rule. Metastore is not a separate >>>>>>> project, but a integral part of hive. >>>>>>> >>>>>>> -namit >>>>>>> >>>>>>> >>>>>>> On 11/12/12 10:32 PM, "Alan Gates" <ga...@hortonworks.com> wrote: >>>>>>> >>>>>>>> I would suggest looking over the patch history of HCat committers. >>>>>> I >>>>>>>> think most of them have already contributed a number of patches to >>>>>> the >>>>>>>> metastore. All are certainly aware of how to run Hive unit tests >>>>>> and >>>>>>>> have an understanding of how Hive works. So I don't think it's >>>>>> fair to >>>>>>>> say they would be unsafe with access to the metastore. And the >>>>>> Hive PMC >>>>>>>> is there to assure this does not happen. If there are issues I am >>>>>> sure >>>>>>>> they can deal with them. >>>>>>>> >>>>>>>> Alan. >>>>>>>> >>>>>>>> >>>>>>>> On Nov 6, 2012, at 8:06 PM, Namit Jain wrote: >>>>>>>> >>>>>>>>> Alan, that would not be a good idea. Metastore code is part of hive >>>>>>>>> code, >>>>>>>>> and it >>>>>>>>> would be safer if only Hive committers had commit access to that. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11/6/12 11:25 PM, "Alan Gates" <ga...@hortonworks.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Nov 4, 2012, at 8:35 PM, Namit Jain wrote: >>>>>>>>>> >>>>>>>>>>> I like the idea of Hcatalog becoming a Hive sub-project. The >>>>>>>>>>> enhancements/bugs in the serde/metastore areas can indirectly >>>>>>>>>>> benefit the hive community, and it will be easier for the fix to >>>>>> be >>>>>> in >>>>>>>>>>> one >>>>>>>>>>> place. Having said that, I don't see serde/metastore >>>>>>>>>>> moving out of hive into a separate component. Things are tied too >>>>>>>>>>> closely >>>>>>>>>>> together. I am assuming that no new committers would >>>>>>>>>>> be automatically added to Hive as part of this, and both Hive and >>>>>>>>>>> HCatalog >>>>>>>>>>> will continue to have its own committers. >>>>>>>>>> >>>>>>>>>> One thing in this we'd like to discuss is the HCatalog committers >>>>>>>>>> having >>>>>>>>>> commit access to the metastore sections of Hive code. That >>>>>> doesn't >>>>>>>>>> mean >>>>>>>>>> it has to move into HCatalog's code base. But more and more the >>>>>> fixes >>>>>>>>>> and changes we're doing in HCatalog are really in Hive's >>>>>> metastore. >>>>>> So >>>>>>>>>> we believe it would make sense to give HCat committers access to >>>>>> that >>>>>>>>>> component as well as HCat. >>>>>>>>>> >>>>>>>>>> Alan. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> -namit >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 11/3/12 2:22 AM, "Alan Gates" <ga...@hortonworks.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello Hive community. It is time for HCatalog to graduate from >>>>>> the >>>>>>>>>>>> Apache Incubator. Given the heavy dependence of HCatalog on >>>>>> Hive >>>>>> the >>>>>>>>>>>> HCatalog community agreed it made sense to explore graduating >>>>>> from >>>>>>>>>>>> the >>>>>>>>>>>> Incubator to become a subproject of Hive (see >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20120 >>>>>>>>>>>> 9. >>>>>>>>>>>> mb >>>>>>>>>>>> ox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com%3E >>>>>> and >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20121 >>>>>>>>>>>> 0. >>>>>>>>>>>> mb >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>> ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gma >>>>>>>>>>>> il >>>>>>>>>>>> .c >>>>>>>>>>>> om%3E ). To help both communities understand what HCatalog is >>>>>> and >>>>>>>>>>>> hopes >>>>>>>>>>>> to become we also developed a roadmap that summarizes HCatalog's >>>>>>>>>>>> current >>>>>>>>>>>> features, planned features, and other possible features under >>>>>>>>>>>> discussion: >>>>>>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap >>>>>>>>>>>> >>>>>>>>>>>> So we are now approaching you to see if there is agreement in >>>>>> the >>>>>>>>>>>> Hive >>>>>>>>>>>> community that HCatalog graduating into Hive would make sense. >>>>>>>>>>>> >>>>>>>>>>>> Alan. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>> >>> >