Alan, I think your proposal sounds great.

--travis

On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates <ga...@hortonworks.com> wrote:
> Carl, speaking just for myself and not as a representative of the HCat PPMC 
> at this point, I am coming to agree with you that HCat integrating with Hive 
> fully makes more sense.
>
> However, this makes the committer question even thornier.  Travis and Namit, 
> I think the shepherd proposal needs to lay out a clear and time bounded path 
> to committership for HCat committers.  Having HCat committers as second class 
> Hive citizens for the long run will not be healthy.  I propose the following 
> as a starting point for discussion:
>
> All active HCat committers (those who have contributed or committed a patch 
> in the last 6 months) will be made committers in the HCat portion only of 
> Hive.  In addition those committers will be assigned a particular shepherd 
> who is a current Hive committer and who will be responsible for mentoring 
> them towards full Hive committership.  As a part of this mentorship the HCat 
> committer will review patches of other contributors, contribute patches to 
> Hive (both inside and outside of HCatalog), respond to user issues on the 
> mailing lists, etc.  It is intended that as a result of this mentorship 
> program HCat committers can become full Hive committers in 6-9 months.  No 
> new HCat only committers will be elected in Hive after this.  All Hive 
> committers will automatically also have commit rights on HCatalog.
>
> Alan.
>
> On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote:
>
>> On a functional level I don't think there is going to be much of a
>> difference between the subproject option proposed by Travis and the other
>> option where HCatalog becomes a TLP. In both cases HCatalog and Hive will
>> have separate committers, separate code repositories, separate release
>> cycles, and separate project roadmaps. Aside from ASF bureaucracy, I think
>> the only major difference between the two options is that the subproject
>> route will give the rest of the community the false impression that the two
>> projects have coordinated roadmaps and a process to prevent overlapping
>> functionality from appearing in both projects. Consequently, If these are
>> the only two options then I would prefer that HCatalog become a TLP.
>>
>> On the other hand, I also agree with many of the sentiments that have
>> already been expressed in this thread, namely that the two projects are
>> closely related and that it would benefit the community at large if the two
>> projects could be brought closer together. Up to this point the major
>> source of pain for the HCatalog team has been the frequent necessity of
>> making changes on both the Hive and HCatalog sides when implementing new
>> features in HCatalog. This situation is compounded by the ASF requirement
>> that release artifacts may not depend on snapshot artifacts from other ASF
>> projects. Furthermore, if Hive adds a dependency on HCatalog then it will
>> be subject to these same problems (in addition to the gross circular
>> dependency!).
>>
>> I think the best way to avoid these problems is for HCatalog to become a
>> Hive submodule. In this scenario HCatalog would exist as a subdirectory in
>> the Hive repository and would be distributed as a Hive artifact in future
>> Hive releases. In addition to solving the problems I mentioned earlier, I
>> think this would also help to assuage the concerns of many Hive committers
>> who don't want to see the MetaStore split out into a separate project.
>>
>> Thanks.
>>
>> Carl
>>
>> On Thu, Dec 13, 2012 at 7:59 PM, Namit Jain <nj...@fb.com> wrote:
>>
>>> I am fine with this. Any hive committers who wants to volunteer to be
>>> a hcat shepherd is welcome.
>>>
>>>
>>>
>>> On 12/14/12 7:01 AM, "Travis Crawford" <traviscrawf...@gmail.com> wrote:
>>>
>>>> Thanks for reviving this thread. Reviewing the comments everyone seems
>>>> to agree HCatalog makes sense as a Hive subproject. I think that's
>>>> great news for the Hadoop community.
>>>>
>>>> The discussion seems to have turned to one of committer permissions. I
>>>> agree with the Hive folks sentiment that its something that must be
>>>> earned. That said, I've found it challenging at times getting patches
>>>> into Hive that would help earn taking on a hive committer
>>>> responsibility.
>>>>
>>>> Proposal: if a couple hive committers can volunteer to be hcat
>>>> shepherds, we can work with the shepherds when making hive changes in
>>>> a timely manor. Conversely, we can help shepherd any hive committers
>>>> who are interested in working more with hcat. There are certainly
>>>> benefits to cross-committership, and this approach could help each
>>>> other build a history of meaningful contributions and earn the
>>>> privilege & responsibility of being committers.
>>>>
>>>> Thoughts?
>>>>
>>>> --travis
>>>>
>>>>
>>>>
>>>> On Thu, Dec 13, 2012 at 11:59 AM, Edward Capriolo <edlinuxg...@gmail.com>
>>>> wrote:
>>>>> I initially was a hesitant of hcatalog mostly because I imagined we
>>>>> would
>>>>> end up in a spot very similar to this.
>>>>>
>>>>> Namely the hcatlog folks are interested in making a metastore to support
>>>>> pig, hive, and map reduce. However I get the impression that many in
>>>>> hive
>>>>> do not care much to have a metastore that caters to everyone. Their
>>>>> needs
>>>>> are only based on what hive needs. Which I believe is the wrong way to
>>>>> look
>>>>> at this situation.
>>>>>
>>>>> I though to reply to this thread because I have been following this
>>>>> Jira:
>>>>> https://issues.apache.org/jira/browse/HIVE-3752
>>>>>
>>>>> On a high level I do not like this duplication of effort and code. If
>>>>> hive
>>>>> is compatible with hcatalog I do not see why we put off merging the two
>>>>> at
>>>>> all. Hive users would get an immediate benefit if Hive used hcatalog
>>>>> with
>>>>> no apparent downside. Meanwhile we are putting this off and staying in
>>>>> this
>>>>> awkward transition phase.
>>>>>
>>>>> Personally, I do not have a problem being a hive committer and not
>>>>> having
>>>>> hcatalog commit. None of the hive work I have done has ever touched the
>>>>> metastore. Also of the thousands of jiras and features we have added
>>>>> only a
>>>>> small portion require metastore changes.
>>>>>
>>>>> As long as a couple active users have commit on hive and the suggested
>>>>> hcatalog subproject I do not think not having commit will be a
>>>>> roadblock in
>>>>> moving hive forward.
>>>>>
>>>>>
>>>>> On Mon, Dec 3, 2012 at 6:22 PM, Alan Gates <ga...@hortonworks.com>
>>>>> wrote:
>>>>>
>>>>>> I am not sure where we are on this discussion.  So far those who have
>>>>>> chimed in seemed generally positive (Namit, Edward, Clark, Alexander).
>>>>>> Namit and I have different visions for what the committership might
>>>>>> look
>>>>>> like, so I'd like to hear from other Hive PMC members what their view
>>>>>> is on
>>>>>> this.  I have to say from an HCatalog perspective the proposition is
>>>>>> much
>>>>>> less attractive without some commit rights.
>>>>>>
>>>>>> On a related note, people should be aware of these threads in the
>>>>>> Incubator list:
>>>>>>
>>>>>>
>>>>>>
>>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%
>>>>>> 3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w%
>>> 40mail.gmail.com
>>>>>> %3E
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%
>>>>>> 3CCAKQbXgDZj_zMj4qSodXjMHV7xQZxpcY1-35cvq959YKLNd6tJQ%40mail.gmail.com
>>> %3
>>>>>> E
>>>>>>
>>>>>> For those not inclined to read all the mails in the threads I will
>>>>>> summarize (though I urge all PMC members of Hive and PPMC members of
>>>>>> HCat
>>>>>> to read both mail threads because this is highly relevant to what we
>>>>>> are
>>>>>> discussing).  There are two salient points in these threads:
>>>>>>
>>>>>> 1) It is not wise to build a subproject that is distinct from the main
>>>>>> project in the sense that it has separate community members interested
>>>>>> in
>>>>>> it.  Bertrand, Arun, Chris Mattman, and Greg Stein all spoke against
>>>>>> this,
>>>>>> and all are long time Apache contributors with a lot of experience.
>>>>>> They
>>>>>> were all of the opinion that it was reasonable for one project to
>>>>>> release
>>>>>> separate products.
>>>>>>
>>>>>> 2) It is not wise to have committers that have access to parts of a
>>>>>> project but not others.  Greg and Bertrand argued (and Arun seemed to
>>>>>> imply) that splitting up committer lists by sections of the code did
>>>>>> not
>>>>>> work out well.
>>>>>>
>>>>>> These insights cause me to question what we mean by subproject.  I had
>>>>>> originally envisioned something that looked like Pig and Hive did when
>>>>>> they
>>>>>> were subprojects of Hadoop.  But this violates both 1 and 2 above.
>>>>>> Given
>>>>>> this input from many of the "wise old timers" of Apache I think we
>>>>>> should
>>>>>> consider what we mean when we say subproject and how tightly we are
>>>>>> willing
>>>>>> to integrate these projects.  Personally I think it makes sense to
>>>>>> continue
>>>>>> to pursue integration, as I think HCat is really a set of interfaces
>>>>>> on top
>>>>>> of Hive and it makes sense to coalesce those into one project.  I guess
>>>>>> this would mean HCat becomes just another set of jars that Hive
>>>>>> releases
>>>>>> when it releases, rather than a stand alone entity.  But I'm curious to
>>>>>> hear what others think.
>>>>>>
>>>>>> Alan.
>>>>>>
>>>>>> On Nov 14, 2012, at 10:22 PM, Namit Jain wrote:
>>>>>>
>>>>>>> The same criteria should be applied to all Hive committers. Only a
>>>>>>> committer should be able to commit code.
>>>>>>> I don¹t think we should bend this rule. Metastore is not a separate
>>>>>>> project, but a integral part of hive.
>>>>>>>
>>>>>>> -namit
>>>>>>>
>>>>>>>
>>>>>>> On 11/12/12 10:32 PM, "Alan Gates" <ga...@hortonworks.com> wrote:
>>>>>>>
>>>>>>>> I would suggest looking over the patch history of HCat committers.
>>>>>> I
>>>>>>>> think most of them have already contributed a number of patches to
>>>>>> the
>>>>>>>> metastore.  All are certainly aware of how to run Hive unit tests
>>>>>> and
>>>>>>>> have an understanding of how Hive works.  So I don't think it's
>>>>>> fair to
>>>>>>>> say they would be unsafe with access to the metastore.  And the
>>>>>> Hive PMC
>>>>>>>> is there to assure this does not happen.  If there are issues I am
>>>>>> sure
>>>>>>>> they can deal with them.
>>>>>>>>
>>>>>>>> Alan.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Nov 6, 2012, at 8:06 PM, Namit Jain wrote:
>>>>>>>>
>>>>>>>>> Alan, that would not be a good idea. Metastore code is part of hive
>>>>>>>>> code,
>>>>>>>>> and it
>>>>>>>>> would be safer if only Hive committers had commit access to that.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/6/12 11:25 PM, "Alan Gates" <ga...@hortonworks.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Nov 4, 2012, at 8:35 PM, Namit Jain wrote:
>>>>>>>>>>
>>>>>>>>>>> I like the idea of Hcatalog becoming a Hive sub-project. The
>>>>>>>>>>> enhancements/bugs in the serde/metastore areas can indirectly
>>>>>>>>>>> benefit the hive community, and it will be easier for the fix to
>>>>>> be
>>>>>> in
>>>>>>>>>>> one
>>>>>>>>>>> place. Having said that, I don't see serde/metastore
>>>>>>>>>>> moving out of hive into a separate component. Things are tied too
>>>>>>>>>>> closely
>>>>>>>>>>> together. I am assuming that no new committers would
>>>>>>>>>>> be automatically added to Hive as part of this, and both Hive and
>>>>>>>>>>> HCatalog
>>>>>>>>>>> will continue to have its own committers.
>>>>>>>>>>
>>>>>>>>>> One thing in this we'd like to discuss is the HCatalog committers
>>>>>>>>>> having
>>>>>>>>>> commit access to the metastore sections of Hive code.  That
>>>>>> doesn't
>>>>>>>>>> mean
>>>>>>>>>> it has to move into HCatalog's code base.  But more and more the
>>>>>> fixes
>>>>>>>>>> and changes we're doing in HCatalog are really in Hive's
>>>>>> metastore.
>>>>>> So
>>>>>>>>>> we believe it would make sense to give HCat committers access to
>>>>>> that
>>>>>>>>>> component as well as HCat.
>>>>>>>>>>
>>>>>>>>>> Alan.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -namit
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/3/12 2:22 AM, "Alan Gates" <ga...@hortonworks.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello Hive community.  It is time for HCatalog to graduate from
>>>>>> the
>>>>>>>>>>>> Apache Incubator.  Given the heavy dependence of HCatalog on
>>>>>> Hive
>>>>>> the
>>>>>>>>>>>> HCatalog community agreed it made sense to explore graduating
>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>> Incubator to become a subproject of Hive (see
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20120
>>>>>>>>>>>> 9.
>>>>>>>>>>>> mb
>>>>>>>>>>>> ox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com%3E
>>>>>> and
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20121
>>>>>>>>>>>> 0.
>>>>>>>>>>>> mb
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>> ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gma
>>>>>>>>>>>> il
>>>>>>>>>>>> .c
>>>>>>>>>>>> om%3E ).  To help both communities understand what HCatalog is
>>>>>> and
>>>>>>>>>>>> hopes
>>>>>>>>>>>> to become we also developed a roadmap that summarizes HCatalog's
>>>>>>>>>>>> current
>>>>>>>>>>>> features, planned features, and other possible features under
>>>>>>>>>>>> discussion:
>>>>>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap
>>>>>>>>>>>>
>>>>>>>>>>>> So we are now approaching you to see if there is agreement in
>>>>>> the
>>>>>>>>>>>> Hive
>>>>>>>>>>>> community that HCatalog graduating into Hive would make sense.
>>>>>>>>>>>>
>>>>>>>>>>>> Alan.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>
>>>
>

Reply via email to