I agree with Ashish. When Hcat becomes a subproject of Hive, all Hcat committers should immediately become Hive committers.
After all, that worked well for Hadoop, where all Hadoop committers can commit to all Hadoop code (common/HDFS/MapReduce), but not all do, instead focusing only on their area of expertise, and familiarity with portions of codebase. - milind --- Milind Bhandarkar Chief Scientist, Machine Learning Platforms, Greenplum, A Division of EMC +1-650-523-3858 (W) +1-408-666-8483 (C) On 12/20/12 5:58 AM, "Ashish Thusoo" <athu...@qubole.com> wrote: >Actually I don't understand why getting Hcat folks as committers on Hive >is >a problem. Hive itself became a subproject of Hadoop when it started with >all the Hive committers becoming Hadoop committers. And of course everyone >maintained the discipline that they commit in parts of the code that they >understand and that they have worked on. Some of the committers from Hive >ended up becoming Hadoop committers - others who worked only on Hive ended >up leaving the Hadoop committers list once Hive became a TLP. So why put >in >these arguments about process when the end result would be beneficial to >the community and to the project. Would Hive not benefit if some folks >from >Hcat start working on Hive proper as well - of course under the guidance >of >Hive mentors etc. Would the project not benefit in the long run if Hcat is >brought in and some day becomes the default metastore for Hive. I mean if >there are so many long term benefits from this then why focus on control >and code safety which I think any responsible committer knows how to >navigate and there are well understood best practices for that. And why >can't a committer be booted out if he/she is breaking the discipline and >really nosing in places which he/she does not understand. > >I mean if we agree that directionally Hcat being a part of Hive makes >sense >then why don't we try to get rid of the procedural elements that would >only >slow down that transition? If there is angst about specific people on Hcat >committers list on the Hive committers side (are there any?), then I think >that should be addressed on a case by case basis but why enforce a general >rule. In the same vein why have a rule saying in 6-9 months a Hcat >committer becomes a Hive committer - how is that helpful? If they are >changing the Hcat subproject in Hive are they not already Hive committers? >And if they gain the expertise to review and commit code in the >SemanticAnalyzer in a few months should they not be able to do that before >9 months are over? And if they don't get that expertise in 9 months would >they really review and commit anything in the SemanticAnalyzer - I mean >there are Hive committers who don't touch that piece of code today. no? > >Ashish > > >On Wed, Dec 19, 2012 at 8:23 PM, Namit Jain <nj...@fb.com> wrote: > >> I don’t agree with the proposal. It is impractical to have a Hcat >>committer >> with commit access to Hcat only portions of Hive. We cannot guarantee >>that >> a Hcat >> committer will become a Hive committer in 6-9 months, that depends on >>what >> they do >> in the next 6-9 months. >> >> The current Hcat committers should spend more time in reviewing patches, >> work on non-Hcat areas in Hive, and then gradually become a hive >> committer. They should not be given any preferential treatment, and the >> process should be same as it would be for any other hive contributor >> currently. Given that the expertise of the Hcat committers, they should >> be inline for becoming a hive committer if they continue to work in >>hive, >> but that cannot be guaranteed. I agree that some Hive committers should >>try >> and help the existing Hcat patches, and again that is voluntary and >> different >> committers cannot be assigned to different parts of the code. >> >> Thanks, >> -namit >> >> >> >> >> >> >> >> On 12/20/12 1:03 AM, "Carl Steinbach" <cwsteinb...@gmail.com> wrote: >> >> >Alan's proposal sounds like a good idea to me. >> > >> >+1 >> > >> >On Dec 18, 2012 5:36 PM, "Travis Crawford" <traviscrawf...@gmail.com> >> >wrote: >> > >> >> Alan, I think your proposal sounds great. >> >> >> >> --travis >> >> >> >> On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates <ga...@hortonworks.com> >> >>wrote: >> >> > Carl, speaking just for myself and not as a representative of the >>HCat >> >> PPMC at this point, I am coming to agree with you that HCat >>integrating >> >> with Hive fully makes more sense. >> >> > >> >> > However, this makes the committer question even thornier. Travis >>and >> >> Namit, I think the shepherd proposal needs to lay out a clear and >>time >> >> bounded path to committership for HCat committers. Having HCat >> >>committers >> >> as second class Hive citizens for the long run will not be healthy. >>I >> >> propose the following as a starting point for discussion: >> >> > >> >> > All active HCat committers (those who have contributed or >>committed a >> >> patch in the last 6 months) will be made committers in the HCat >>portion >> >> only of Hive. In addition those committers will be assigned a >> >>particular >> >> shepherd who is a current Hive committer and who will be responsible >>for >> >> mentoring them towards full Hive committership. As a part of this >> >> mentorship the HCat committer will review patches of other >>contributors, >> >> contribute patches to Hive (both inside and outside of HCatalog), >> >>respond >> >> to user issues on the mailing lists, etc. It is intended that as a >> >>result >> >> of this mentorship program HCat committers can become full Hive >> >>committers >> >> in 6-9 months. No new HCat only committers will be elected in Hive >> >>after >> >> this. All Hive committers will automatically also have commit >>rights on >> >> HCatalog. >> >> > >> >> > Alan. >> >> > >> >> > On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote: >> >> > >> >> >> On a functional level I don't think there is going to be much of a >> >> >> difference between the subproject option proposed by Travis and >>the >> >> other >> >> >> option where HCatalog becomes a TLP. In both cases HCatalog and >>Hive >> >> will >> >> >> have separate committers, separate code repositories, separate >> >>release >> >> >> cycles, and separate project roadmaps. Aside from ASF >>bureaucracy, I >> >> think >> >> >> the only major difference between the two options is that the >> >>subproject >> >> >> route will give the rest of the community the false impression >>that >> >>the >> >> two >> >> >> projects have coordinated roadmaps and a process to prevent >> >>overlapping >> >> >> functionality from appearing in both projects. Consequently, If >>these >> >> are >> >> >> the only two options then I would prefer that HCatalog become a >>TLP. >> >> >> >> >> >> On the other hand, I also agree with many of the sentiments that >>have >> >> >> already been expressed in this thread, namely that the two >>projects >> >>are >> >> >> closely related and that it would benefit the community at large >>if >> >>the >> >> two >> >> >> projects could be brought closer together. Up to this point the >>major >> >> >> source of pain for the HCatalog team has been the frequent >>necessity >> >>of >> >> >> making changes on both the Hive and HCatalog sides when >>implementing >> >>new >> >> >> features in HCatalog. This situation is compounded by the ASF >> >> requirement >> >> >> that release artifacts may not depend on snapshot artifacts from >> >>other >> >> ASF >> >> >> projects. Furthermore, if Hive adds a dependency on HCatalog then >>it >> >> will >> >> >> be subject to these same problems (in addition to the gross >>circular >> >> >> dependency!). >> >> >> >> >> >> I think the best way to avoid these problems is for HCatalog to >> >>become a >> >> >> Hive submodule. In this scenario HCatalog would exist as a >> >>subdirectory >> >> in >> >> >> the Hive repository and would be distributed as a Hive artifact in >> >> future >> >> >> Hive releases. In addition to solving the problems I mentioned >> >>earlier, >> >> I >> >> >> think this would also help to assuage the concerns of many Hive >> >> committers >> >> >> who don't want to see the MetaStore split out into a separate >> >>project. >> >> >> >> >> >> Thanks. >> >> >> >> >> >> Carl >> >> >> >> >> >> On Thu, Dec 13, 2012 at 7:59 PM, Namit Jain <nj...@fb.com> wrote: >> >> >> >> >> >>> I am fine with this. Any hive committers who wants to volunteer >>to >> >>be >> >> >>> a hcat shepherd is welcome. >> >> >>> >> >> >>> >> >> >>> >> >> >>> On 12/14/12 7:01 AM, "Travis Crawford" <traviscrawf...@gmail.com> >> >> wrote: >> >> >>> >> >> >>>> Thanks for reviving this thread. Reviewing the comments everyone >> >>seems >> >> >>>> to agree HCatalog makes sense as a Hive subproject. I think >>that's >> >> >>>> great news for the Hadoop community. >> >> >>>> >> >> >>>> The discussion seems to have turned to one of committer >> >>permissions. I >> >> >>>> agree with the Hive folks sentiment that its something that >>must be >> >> >>>> earned. That said, I've found it challenging at times getting >> >>patches >> >> >>>> into Hive that would help earn taking on a hive committer >> >> >>>> responsibility. >> >> >>>> >> >> >>>> Proposal: if a couple hive committers can volunteer to be hcat >> >> >>>> shepherds, we can work with the shepherds when making hive >>changes >> >>in >> >> >>>> a timely manor. Conversely, we can help shepherd any hive >> >>committers >> >> >>>> who are interested in working more with hcat. There are >>certainly >> >> >>>> benefits to cross-committership, and this approach could help >>each >> >> >>>> other build a history of meaningful contributions and earn the >> >> >>>> privilege & responsibility of being committers. >> >> >>>> >> >> >>>> Thoughts? >> >> >>>> >> >> >>>> --travis >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> On Thu, Dec 13, 2012 at 11:59 AM, Edward Capriolo < >> >> edlinuxg...@gmail.com> >> >> >>>> wrote: >> >> >>>>> I initially was a hesitant of hcatalog mostly because I >>imagined >> >>we >> >> >>>>> would >> >> >>>>> end up in a spot very similar to this. >> >> >>>>> >> >> >>>>> Namely the hcatlog folks are interested in making a metastore >>to >> >> support >> >> >>>>> pig, hive, and map reduce. However I get the impression that >>many >> >>in >> >> >>>>> hive >> >> >>>>> do not care much to have a metastore that caters to everyone. >> >>Their >> >> >>>>> needs >> >> >>>>> are only based on what hive needs. Which I believe is the wrong >> >>way >> >> to >> >> >>>>> look >> >> >>>>> at this situation. >> >> >>>>> >> >> >>>>> I though to reply to this thread because I have been following >> >>this >> >> >>>>> Jira: >> >> >>>>> https://issues.apache.org/jira/browse/HIVE-3752 >> >> >>>>> >> >> >>>>> On a high level I do not like this duplication of effort and >> >>code. If >> >> >>>>> hive >> >> >>>>> is compatible with hcatalog I do not see why we put off merging >> >>the >> >> two >> >> >>>>> at >> >> >>>>> all. Hive users would get an immediate benefit if Hive used >> >>hcatalog >> >> >>>>> with >> >> >>>>> no apparent downside. Meanwhile we are putting this off and >> >>staying >> >> in >> >> >>>>> this >> >> >>>>> awkward transition phase. >> >> >>>>> >> >> >>>>> Personally, I do not have a problem being a hive committer and >>not >> >> >>>>> having >> >> >>>>> hcatalog commit. None of the hive work I have done has ever >> >>touched >> >> the >> >> >>>>> metastore. Also of the thousands of jiras and features we have >> >>added >> >> >>>>> only a >> >> >>>>> small portion require metastore changes. >> >> >>>>> >> >> >>>>> As long as a couple active users have commit on hive and the >> >> suggested >> >> >>>>> hcatalog subproject I do not think not having commit will be a >> >> >>>>> roadblock in >> >> >>>>> moving hive forward. >> >> >>>>> >> >> >>>>> >> >> >>>>> On Mon, Dec 3, 2012 at 6:22 PM, Alan Gates >><ga...@hortonworks.com >> > >> >> >>>>> wrote: >> >> >>>>> >> >> >>>>>> I am not sure where we are on this discussion. So far those >>who >> >> have >> >> >>>>>> chimed in seemed generally positive (Namit, Edward, Clark, >> >> Alexander). >> >> >>>>>> Namit and I have different visions for what the committership >> >>might >> >> >>>>>> look >> >> >>>>>> like, so I'd like to hear from other Hive PMC members what >>their >> >> view >> >> >>>>>> is on >> >> >>>>>> this. I have to say from an HCatalog perspective the >> >>proposition is >> >> >>>>>> much >> >> >>>>>> less attractive without some commit rights. >> >> >>>>>> >> >> >>>>>> On a related note, people should be aware of these threads in >>the >> >> >>>>>> Incubator list: >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>> >> >> >> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/% >> >> >>>>>> 3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w% >> >> >>> 40mail.gmail.com >> >> >>>>>> %3E >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>> >> >> >> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/% >> >> >>>>>> 3CCAKQbXgDZj_zMj4qSodXjMHV7xQZxpcY1-35cvq959YKLNd6tJQ% >> >> 40mail.gmail.com >> >> >>> %3 >> >> >>>>>> E >> >> >>>>>> >> >> >>>>>> For those not inclined to read all the mails in the threads I >> >>will >> >> >>>>>> summarize (though I urge all PMC members of Hive and PPMC >> >>members of >> >> >>>>>> HCat >> >> >>>>>> to read both mail threads because this is highly relevant to >> >>what we >> >> >>>>>> are >> >> >>>>>> discussing). There are two salient points in these threads: >> >> >>>>>> >> >> >>>>>> 1) It is not wise to build a subproject that is distinct from >>the >> >> main >> >> >>>>>> project in the sense that it has separate community members >> >> interested >> >> >>>>>> in >> >> >>>>>> it. Bertrand, Arun, Chris Mattman, and Greg Stein all spoke >> >>against >> >> >>>>>> this, >> >> >>>>>> and all are long time Apache contributors with a lot of >> >>experience. >> >> >>>>>> They >> >> >>>>>> were all of the opinion that it was reasonable for one >>project to >> >> >>>>>> release >> >> >>>>>> separate products. >> >> >>>>>> >> >> >>>>>> 2) It is not wise to have committers that have access to parts >> >>of a >> >> >>>>>> project but not others. Greg and Bertrand argued (and Arun >> >>seemed >> >> to >> >> >>>>>> imply) that splitting up committer lists by sections of the >>code >> >>did >> >> >>>>>> not >> >> >>>>>> work out well. >> >> >>>>>> >> >> >>>>>> These insights cause me to question what we mean by >>subproject. >> >>I >> >> had >> >> >>>>>> originally envisioned something that looked like Pig and Hive >>did >> >> when >> >> >>>>>> they >> >> >>>>>> were subprojects of Hadoop. But this violates both 1 and 2 >> >>above. >> >> >>>>>> Given >> >> >>>>>> this input from many of the "wise old timers" of Apache I >>think >> >>we >> >> >>>>>> should >> >> >>>>>> consider what we mean when we say subproject and how tightly >>we >> >>are >> >> >>>>>> willing >> >> >>>>>> to integrate these projects. Personally I think it makes >>sense >> >>to >> >> >>>>>> continue >> >> >>>>>> to pursue integration, as I think HCat is really a set of >> >>interfaces >> >> >>>>>> on top >> >> >>>>>> of Hive and it makes sense to coalesce those into one >>project. I >> >> guess >> >> >>>>>> this would mean HCat becomes just another set of jars that >>Hive >> >> >>>>>> releases >> >> >>>>>> when it releases, rather than a stand alone entity. But I'm >> >> curious to >> >> >>>>>> hear what others think. >> >> >>>>>> >> >> >>>>>> Alan. >> >> >>>>>> >> >> >>>>>> On Nov 14, 2012, at 10:22 PM, Namit Jain wrote: >> >> >>>>>> >> >> >>>>>>> The same criteria should be applied to all Hive committers. >> >>Only a >> >> >>>>>>> committer should be able to commit code. >> >> >>>>>>> I don¹t think we should bend this rule. Metastore is not a >> >>separate >> >> >>>>>>> project, but a integral part of hive. >> >> >>>>>>> >> >> >>>>>>> -namit >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> On 11/12/12 10:32 PM, "Alan Gates" <ga...@hortonworks.com> >> >>wrote: >> >> >>>>>>> >> >> >>>>>>>> I would suggest looking over the patch history of HCat >> >>committers. >> >> >>>>>> I >> >> >>>>>>>> think most of them have already contributed a number of >> >>patches to >> >> >>>>>> the >> >> >>>>>>>> metastore. All are certainly aware of how to run Hive unit >> >>tests >> >> >>>>>> and >> >> >>>>>>>> have an understanding of how Hive works. So I don't think >>it's >> >> >>>>>> fair to >> >> >>>>>>>> say they would be unsafe with access to the metastore. And >>the >> >> >>>>>> Hive PMC >> >> >>>>>>>> is there to assure this does not happen. If there are >>issues >> >>I am >> >> >>>>>> sure >> >> >>>>>>>> they can deal with them. >> >> >>>>>>>> >> >> >>>>>>>> Alan. >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> On Nov 6, 2012, at 8:06 PM, Namit Jain wrote: >> >> >>>>>>>> >> >> >>>>>>>>> Alan, that would not be a good idea. Metastore code is >>part of >> >> hive >> >> >>>>>>>>> code, >> >> >>>>>>>>> and it >> >> >>>>>>>>> would be safer if only Hive committers had commit access to >> >>that. >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> On 11/6/12 11:25 PM, "Alan Gates" <ga...@hortonworks.com> >> >>wrote: >> >> >>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> On Nov 4, 2012, at 8:35 PM, Namit Jain wrote: >> >> >>>>>>>>>> >> >> >>>>>>>>>>> I like the idea of Hcatalog becoming a Hive sub-project. >>The >> >> >>>>>>>>>>> enhancements/bugs in the serde/metastore areas can >> >>indirectly >> >> >>>>>>>>>>> benefit the hive community, and it will be easier for the >> >>fix >> >> to >> >> >>>>>> be >> >> >>>>>> in >> >> >>>>>>>>>>> one >> >> >>>>>>>>>>> place. Having said that, I don't see serde/metastore >> >> >>>>>>>>>>> moving out of hive into a separate component. Things are >> >>tied >> >> too >> >> >>>>>>>>>>> closely >> >> >>>>>>>>>>> together. I am assuming that no new committers would >> >> >>>>>>>>>>> be automatically added to Hive as part of this, and both >> >>Hive >> >> and >> >> >>>>>>>>>>> HCatalog >> >> >>>>>>>>>>> will continue to have its own committers. >> >> >>>>>>>>>> >> >> >>>>>>>>>> One thing in this we'd like to discuss is the HCatalog >> >> committers >> >> >>>>>>>>>> having >> >> >>>>>>>>>> commit access to the metastore sections of Hive code. >>That >> >> >>>>>> doesn't >> >> >>>>>>>>>> mean >> >> >>>>>>>>>> it has to move into HCatalog's code base. But more and >>more >> >>the >> >> >>>>>> fixes >> >> >>>>>>>>>> and changes we're doing in HCatalog are really in Hive's >> >> >>>>>> metastore. >> >> >>>>>> So >> >> >>>>>>>>>> we believe it would make sense to give HCat committers >> >>access to >> >> >>>>>> that >> >> >>>>>>>>>> component as well as HCat. >> >> >>>>>>>>>> >> >> >>>>>>>>>> Alan. >> >> >>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> Thanks, >> >> >>>>>>>>>>> -namit >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> On 11/3/12 2:22 AM, "Alan Gates" <ga...@hortonworks.com> >> >> wrote: >> >> >>>>>>>>>>> >> >> >>>>>>>>>>>> Hello Hive community. It is time for HCatalog to >>graduate >> >> from >> >> >>>>>> the >> >> >>>>>>>>>>>> Apache Incubator. Given the heavy dependence of >>HCatalog >> >>on >> >> >>>>>> Hive >> >> >>>>>> the >> >> >>>>>>>>>>>> HCatalog community agreed it made sense to explore >> >>graduating >> >> >>>>>> from >> >> >>>>>>>>>>>> the >> >> >>>>>>>>>>>> Incubator to become a subproject of Hive (see >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>> >> >> >>http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20120 >> >> >>>>>>>>>>>> 9. >> >> >>>>>>>>>>>> mb >> >> >>>>>>>>>>>> >> >>ox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com >> >> %3E >> >> >>>>>> and >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>> >> >> >>http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20121 >> >> >>>>>>>>>>>> 0. >> >> >>>>>>>>>>>> mb >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>> >> >> >>ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gma >> >> >>>>>>>>>>>> il >> >> >>>>>>>>>>>> .c >> >> >>>>>>>>>>>> om%3E ). To help both communities understand what >> >>HCatalog is >> >> >>>>>> and >> >> >>>>>>>>>>>> hopes >> >> >>>>>>>>>>>> to become we also developed a roadmap that summarizes >> >> HCatalog's >> >> >>>>>>>>>>>> current >> >> >>>>>>>>>>>> features, planned features, and other possible features >> >>under >> >> >>>>>>>>>>>> discussion: >> >> >>>>>>>>>>>> >> >> >>>>>> >> >> https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> So we are now approaching you to see if there is >>agreement >> >>in >> >> >>>>>> the >> >> >>>>>>>>>>>> Hive >> >> >>>>>>>>>>>> community that HCatalog graduating into Hive would make >> >>sense. >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> Alan. >> >> >>>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>> >> >> >>>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>> >> >> >>> >> >> > >> >> >> >>