I agree with Ashish.

When Hcat becomes a subproject of Hive, all Hcat committers should
immediately become Hive committers.

After all, that worked well for Hadoop, where all Hadoop committers can
commit to all Hadoop code (common/HDFS/MapReduce), but not all do, instead
focusing only on their area of expertise, and familiarity with portions of
codebase.

- milind

---
Milind Bhandarkar
Chief Scientist,
Machine Learning Platforms,
Greenplum, A Division of EMC
+1-650-523-3858 (W)
+1-408-666-8483 (C)





On 12/20/12 5:58 AM, "Ashish Thusoo" <athu...@qubole.com> wrote:

>Actually I don't understand why getting Hcat folks as committers on Hive
>is
>a problem. Hive itself became a subproject of Hadoop when it started with
>all the Hive committers becoming Hadoop committers. And of course everyone
>maintained the discipline that they commit in parts of the code that they
>understand and that they have worked on. Some of the committers from Hive
>ended up becoming Hadoop committers - others who worked only on Hive ended
>up leaving the Hadoop committers list once Hive became a TLP. So why put
>in
>these arguments about process when the end result would be beneficial to
>the community and to the project. Would Hive not benefit if some folks
>from
>Hcat start working on Hive proper as well - of course under the guidance
>of
>Hive mentors etc. Would the project not benefit in the long run if Hcat is
>brought in and some day becomes the default metastore for Hive. I mean if
>there are so many long term benefits from this then why focus on control
>and code safety which I think any responsible committer knows how to
>navigate and there are well understood best practices for that. And why
>can't a committer be booted out if he/she is breaking the discipline and
>really nosing in places which he/she does not understand.
>
>I mean if we agree that directionally Hcat being a part of Hive makes
>sense
>then why don't we try to get rid of the procedural elements that would
>only
>slow down that transition? If there is angst about specific people on Hcat
>committers list on the Hive committers side (are there any?), then I think
>that should be addressed on a case by case basis but why enforce a general
>rule. In the same vein why have a rule saying in 6-9 months a Hcat
>committer becomes a Hive committer - how is that helpful? If they are
>changing the Hcat subproject in Hive are they not already Hive committers?
>And if they gain the expertise to review and commit code in the
>SemanticAnalyzer in a few months should they not be able to do that before
>9 months are over? And if they don't get that expertise in 9 months would
>they really review and commit anything in the SemanticAnalyzer - I mean
>there are Hive committers who don't touch that piece of code today. no?
>
>Ashish
>
>
>On Wed, Dec 19, 2012 at 8:23 PM, Namit Jain <nj...@fb.com> wrote:
>
>> I don’t agree with the proposal. It is impractical to have a Hcat
>>committer
>> with commit access to Hcat only portions of Hive. We cannot guarantee
>>that
>> a Hcat
>> committer will become a Hive committer in 6-9 months, that depends on
>>what
>> they do
>> in the next 6-9 months.
>>
>> The current Hcat committers should spend more time in reviewing patches,
>> work on non-Hcat areas in Hive, and then gradually become a hive
>> committer. They should not be given any preferential treatment, and the
>> process should be same as it would be for any other hive contributor
>> currently. Given that the expertise of the Hcat committers, they should
>> be inline for becoming a hive committer if they continue to work in
>>hive,
>> but that cannot be guaranteed. I agree that some Hive committers should
>>try
>> and help the existing Hcat patches, and again that is voluntary and
>> different
>> committers cannot be assigned to different parts of the code.
>>
>> Thanks,
>> -namit
>>
>>
>>
>>
>>
>>
>>
>> On 12/20/12 1:03 AM, "Carl Steinbach" <cwsteinb...@gmail.com> wrote:
>>
>> >Alan's proposal sounds like a good idea to me.
>> >
>> >+1
>> >
>> >On Dec 18, 2012 5:36 PM, "Travis Crawford" <traviscrawf...@gmail.com>
>> >wrote:
>> >
>> >> Alan, I think your proposal sounds great.
>> >>
>> >> --travis
>> >>
>> >> On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates <ga...@hortonworks.com>
>> >>wrote:
>> >> > Carl, speaking just for myself and not as a representative of the
>>HCat
>> >> PPMC at this point, I am coming to agree with you that HCat
>>integrating
>> >> with Hive fully makes more sense.
>> >> >
>> >> > However, this makes the committer question even thornier.  Travis
>>and
>> >> Namit, I think the shepherd proposal needs to lay out a clear and
>>time
>> >> bounded path to committership for HCat committers.  Having HCat
>> >>committers
>> >> as second class Hive citizens for the long run will not be healthy.
>>I
>> >> propose the following as a starting point for discussion:
>> >> >
>> >> > All active HCat committers (those who have contributed or
>>committed a
>> >> patch in the last 6 months) will be made committers in the HCat
>>portion
>> >> only of Hive.  In addition those committers will be assigned a
>> >>particular
>> >> shepherd who is a current Hive committer and who will be responsible
>>for
>> >> mentoring them towards full Hive committership.  As a part of this
>> >> mentorship the HCat committer will review patches of other
>>contributors,
>> >> contribute patches to Hive (both inside and outside of HCatalog),
>> >>respond
>> >> to user issues on the mailing lists, etc.  It is intended that as a
>> >>result
>> >> of this mentorship program HCat committers can become full Hive
>> >>committers
>> >> in 6-9 months.  No new HCat only committers will be elected in Hive
>> >>after
>> >> this.  All Hive committers will automatically also have commit
>>rights on
>> >> HCatalog.
>> >> >
>> >> > Alan.
>> >> >
>> >> > On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote:
>> >> >
>> >> >> On a functional level I don't think there is going to be much of a
>> >> >> difference between the subproject option proposed by Travis and
>>the
>> >> other
>> >> >> option where HCatalog becomes a TLP. In both cases HCatalog and
>>Hive
>> >> will
>> >> >> have separate committers, separate code repositories, separate
>> >>release
>> >> >> cycles, and separate project roadmaps. Aside from ASF
>>bureaucracy, I
>> >> think
>> >> >> the only major difference between the two options is that the
>> >>subproject
>> >> >> route will give the rest of the community the false impression
>>that
>> >>the
>> >> two
>> >> >> projects have coordinated roadmaps and a process to prevent
>> >>overlapping
>> >> >> functionality from appearing in both projects. Consequently, If
>>these
>> >> are
>> >> >> the only two options then I would prefer that HCatalog become a
>>TLP.
>> >> >>
>> >> >> On the other hand, I also agree with many of the sentiments that
>>have
>> >> >> already been expressed in this thread, namely that the two
>>projects
>> >>are
>> >> >> closely related and that it would benefit the community at large
>>if
>> >>the
>> >> two
>> >> >> projects could be brought closer together. Up to this point the
>>major
>> >> >> source of pain for the HCatalog team has been the frequent
>>necessity
>> >>of
>> >> >> making changes on both the Hive and HCatalog sides when
>>implementing
>> >>new
>> >> >> features in HCatalog. This situation is compounded by the ASF
>> >> requirement
>> >> >> that release artifacts may not depend on snapshot artifacts from
>> >>other
>> >> ASF
>> >> >> projects. Furthermore, if Hive adds a dependency on HCatalog then
>>it
>> >> will
>> >> >> be subject to these same problems (in addition to the gross
>>circular
>> >> >> dependency!).
>> >> >>
>> >> >> I think the best way to avoid these problems is for HCatalog to
>> >>become a
>> >> >> Hive submodule. In this scenario HCatalog would exist as a
>> >>subdirectory
>> >> in
>> >> >> the Hive repository and would be distributed as a Hive artifact in
>> >> future
>> >> >> Hive releases. In addition to solving the problems I mentioned
>> >>earlier,
>> >> I
>> >> >> think this would also help to assuage the concerns of many Hive
>> >> committers
>> >> >> who don't want to see the MetaStore split out into a separate
>> >>project.
>> >> >>
>> >> >> Thanks.
>> >> >>
>> >> >> Carl
>> >> >>
>> >> >> On Thu, Dec 13, 2012 at 7:59 PM, Namit Jain <nj...@fb.com> wrote:
>> >> >>
>> >> >>> I am fine with this. Any hive committers who wants to volunteer
>>to
>> >>be
>> >> >>> a hcat shepherd is welcome.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 12/14/12 7:01 AM, "Travis Crawford" <traviscrawf...@gmail.com>
>> >> wrote:
>> >> >>>
>> >> >>>> Thanks for reviving this thread. Reviewing the comments everyone
>> >>seems
>> >> >>>> to agree HCatalog makes sense as a Hive subproject. I think
>>that's
>> >> >>>> great news for the Hadoop community.
>> >> >>>>
>> >> >>>> The discussion seems to have turned to one of committer
>> >>permissions. I
>> >> >>>> agree with the Hive folks sentiment that its something that
>>must be
>> >> >>>> earned. That said, I've found it challenging at times getting
>> >>patches
>> >> >>>> into Hive that would help earn taking on a hive committer
>> >> >>>> responsibility.
>> >> >>>>
>> >> >>>> Proposal: if a couple hive committers can volunteer to be hcat
>> >> >>>> shepherds, we can work with the shepherds when making hive
>>changes
>> >>in
>> >> >>>> a timely manor. Conversely, we can help shepherd any hive
>> >>committers
>> >> >>>> who are interested in working more with hcat. There are
>>certainly
>> >> >>>> benefits to cross-committership, and this approach could help
>>each
>> >> >>>> other build a history of meaningful contributions and earn the
>> >> >>>> privilege & responsibility of being committers.
>> >> >>>>
>> >> >>>> Thoughts?
>> >> >>>>
>> >> >>>> --travis
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> On Thu, Dec 13, 2012 at 11:59 AM, Edward Capriolo <
>> >> edlinuxg...@gmail.com>
>> >> >>>> wrote:
>> >> >>>>> I initially was a hesitant of hcatalog mostly because I
>>imagined
>> >>we
>> >> >>>>> would
>> >> >>>>> end up in a spot very similar to this.
>> >> >>>>>
>> >> >>>>> Namely the hcatlog folks are interested in making a metastore
>>to
>> >> support
>> >> >>>>> pig, hive, and map reduce. However I get the impression that
>>many
>> >>in
>> >> >>>>> hive
>> >> >>>>> do not care much to have a metastore that caters to everyone.
>> >>Their
>> >> >>>>> needs
>> >> >>>>> are only based on what hive needs. Which I believe is the wrong
>> >>way
>> >> to
>> >> >>>>> look
>> >> >>>>> at this situation.
>> >> >>>>>
>> >> >>>>> I though to reply to this thread because I have been following
>> >>this
>> >> >>>>> Jira:
>> >> >>>>> https://issues.apache.org/jira/browse/HIVE-3752
>> >> >>>>>
>> >> >>>>> On a high level I do not like this duplication of effort and
>> >>code. If
>> >> >>>>> hive
>> >> >>>>> is compatible with hcatalog I do not see why we put off merging
>> >>the
>> >> two
>> >> >>>>> at
>> >> >>>>> all. Hive users would get an immediate benefit if Hive used
>> >>hcatalog
>> >> >>>>> with
>> >> >>>>> no apparent downside. Meanwhile we are putting this off and
>> >>staying
>> >> in
>> >> >>>>> this
>> >> >>>>> awkward transition phase.
>> >> >>>>>
>> >> >>>>> Personally, I do not have a problem being a hive committer and
>>not
>> >> >>>>> having
>> >> >>>>> hcatalog commit. None of the hive work I have done has ever
>> >>touched
>> >> the
>> >> >>>>> metastore. Also of the thousands of jiras and features we have
>> >>added
>> >> >>>>> only a
>> >> >>>>> small portion require metastore changes.
>> >> >>>>>
>> >> >>>>> As long as a couple active users have commit on hive and the
>> >> suggested
>> >> >>>>> hcatalog subproject I do not think not having commit will be a
>> >> >>>>> roadblock in
>> >> >>>>> moving hive forward.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Mon, Dec 3, 2012 at 6:22 PM, Alan Gates
>><ga...@hortonworks.com
>> >
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>>> I am not sure where we are on this discussion.  So far those
>>who
>> >> have
>> >> >>>>>> chimed in seemed generally positive (Namit, Edward, Clark,
>> >> Alexander).
>> >> >>>>>> Namit and I have different visions for what the committership
>> >>might
>> >> >>>>>> look
>> >> >>>>>> like, so I'd like to hear from other Hive PMC members what
>>their
>> >> view
>> >> >>>>>> is on
>> >> >>>>>> this.  I have to say from an HCatalog perspective the
>> >>proposition is
>> >> >>>>>> much
>> >> >>>>>> less attractive without some commit rights.
>> >> >>>>>>
>> >> >>>>>> On a related note, people should be aware of these threads in
>>the
>> >> >>>>>> Incubator list:
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>
>> >>
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%
>> >> >>>>>> 3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w%
>> >> >>> 40mail.gmail.com
>> >> >>>>>> %3E
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>
>> >>
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%
>> >> >>>>>> 3CCAKQbXgDZj_zMj4qSodXjMHV7xQZxpcY1-35cvq959YKLNd6tJQ%
>> >> 40mail.gmail.com
>> >> >>> %3
>> >> >>>>>> E
>> >> >>>>>>
>> >> >>>>>> For those not inclined to read all the mails in the threads I
>> >>will
>> >> >>>>>> summarize (though I urge all PMC members of Hive and PPMC
>> >>members of
>> >> >>>>>> HCat
>> >> >>>>>> to read both mail threads because this is highly relevant to
>> >>what we
>> >> >>>>>> are
>> >> >>>>>> discussing).  There are two salient points in these threads:
>> >> >>>>>>
>> >> >>>>>> 1) It is not wise to build a subproject that is distinct from
>>the
>> >> main
>> >> >>>>>> project in the sense that it has separate community members
>> >> interested
>> >> >>>>>> in
>> >> >>>>>> it.  Bertrand, Arun, Chris Mattman, and Greg Stein all spoke
>> >>against
>> >> >>>>>> this,
>> >> >>>>>> and all are long time Apache contributors with a lot of
>> >>experience.
>> >> >>>>>> They
>> >> >>>>>> were all of the opinion that it was reasonable for one
>>project to
>> >> >>>>>> release
>> >> >>>>>> separate products.
>> >> >>>>>>
>> >> >>>>>> 2) It is not wise to have committers that have access to parts
>> >>of a
>> >> >>>>>> project but not others.  Greg and Bertrand argued (and Arun
>> >>seemed
>> >> to
>> >> >>>>>> imply) that splitting up committer lists by sections of the
>>code
>> >>did
>> >> >>>>>> not
>> >> >>>>>> work out well.
>> >> >>>>>>
>> >> >>>>>> These insights cause me to question what we mean by
>>subproject.
>> >>I
>> >> had
>> >> >>>>>> originally envisioned something that looked like Pig and Hive
>>did
>> >> when
>> >> >>>>>> they
>> >> >>>>>> were subprojects of Hadoop.  But this violates both 1 and 2
>> >>above.
>> >> >>>>>> Given
>> >> >>>>>> this input from many of the "wise old timers" of Apache I
>>think
>> >>we
>> >> >>>>>> should
>> >> >>>>>> consider what we mean when we say subproject and how tightly
>>we
>> >>are
>> >> >>>>>> willing
>> >> >>>>>> to integrate these projects.  Personally I think it makes
>>sense
>> >>to
>> >> >>>>>> continue
>> >> >>>>>> to pursue integration, as I think HCat is really a set of
>> >>interfaces
>> >> >>>>>> on top
>> >> >>>>>> of Hive and it makes sense to coalesce those into one
>>project.  I
>> >> guess
>> >> >>>>>> this would mean HCat becomes just another set of jars that
>>Hive
>> >> >>>>>> releases
>> >> >>>>>> when it releases, rather than a stand alone entity.  But I'm
>> >> curious to
>> >> >>>>>> hear what others think.
>> >> >>>>>>
>> >> >>>>>> Alan.
>> >> >>>>>>
>> >> >>>>>> On Nov 14, 2012, at 10:22 PM, Namit Jain wrote:
>> >> >>>>>>
>> >> >>>>>>> The same criteria should be applied to all Hive committers.
>> >>Only a
>> >> >>>>>>> committer should be able to commit code.
>> >> >>>>>>> I don¹t think we should bend this rule. Metastore is not a
>> >>separate
>> >> >>>>>>> project, but a integral part of hive.
>> >> >>>>>>>
>> >> >>>>>>> -namit
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> On 11/12/12 10:32 PM, "Alan Gates" <ga...@hortonworks.com>
>> >>wrote:
>> >> >>>>>>>
>> >> >>>>>>>> I would suggest looking over the patch history of HCat
>> >>committers.
>> >> >>>>>> I
>> >> >>>>>>>> think most of them have already contributed a number of
>> >>patches to
>> >> >>>>>> the
>> >> >>>>>>>> metastore.  All are certainly aware of how to run Hive unit
>> >>tests
>> >> >>>>>> and
>> >> >>>>>>>> have an understanding of how Hive works.  So I don't think
>>it's
>> >> >>>>>> fair to
>> >> >>>>>>>> say they would be unsafe with access to the metastore.  And
>>the
>> >> >>>>>> Hive PMC
>> >> >>>>>>>> is there to assure this does not happen.  If there are
>>issues
>> >>I am
>> >> >>>>>> sure
>> >> >>>>>>>> they can deal with them.
>> >> >>>>>>>>
>> >> >>>>>>>> Alan.
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> On Nov 6, 2012, at 8:06 PM, Namit Jain wrote:
>> >> >>>>>>>>
>> >> >>>>>>>>> Alan, that would not be a good idea. Metastore code is
>>part of
>> >> hive
>> >> >>>>>>>>> code,
>> >> >>>>>>>>> and it
>> >> >>>>>>>>> would be safer if only Hive committers had commit access to
>> >>that.
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> On 11/6/12 11:25 PM, "Alan Gates" <ga...@hortonworks.com>
>> >>wrote:
>> >> >>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> On Nov 4, 2012, at 8:35 PM, Namit Jain wrote:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> I like the idea of Hcatalog becoming a Hive sub-project.
>>The
>> >> >>>>>>>>>>> enhancements/bugs in the serde/metastore areas can
>> >>indirectly
>> >> >>>>>>>>>>> benefit the hive community, and it will be easier for the
>> >>fix
>> >> to
>> >> >>>>>> be
>> >> >>>>>> in
>> >> >>>>>>>>>>> one
>> >> >>>>>>>>>>> place. Having said that, I don't see serde/metastore
>> >> >>>>>>>>>>> moving out of hive into a separate component. Things are
>> >>tied
>> >> too
>> >> >>>>>>>>>>> closely
>> >> >>>>>>>>>>> together. I am assuming that no new committers would
>> >> >>>>>>>>>>> be automatically added to Hive as part of this, and both
>> >>Hive
>> >> and
>> >> >>>>>>>>>>> HCatalog
>> >> >>>>>>>>>>> will continue to have its own committers.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> One thing in this we'd like to discuss is the HCatalog
>> >> committers
>> >> >>>>>>>>>> having
>> >> >>>>>>>>>> commit access to the metastore sections of Hive code.
>>That
>> >> >>>>>> doesn't
>> >> >>>>>>>>>> mean
>> >> >>>>>>>>>> it has to move into HCatalog's code base.  But more and
>>more
>> >>the
>> >> >>>>>> fixes
>> >> >>>>>>>>>> and changes we're doing in HCatalog are really in Hive's
>> >> >>>>>> metastore.
>> >> >>>>>> So
>> >> >>>>>>>>>> we believe it would make sense to give HCat committers
>> >>access to
>> >> >>>>>> that
>> >> >>>>>>>>>> component as well as HCat.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Alan.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Thanks,
>> >> >>>>>>>>>>> -namit
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> On 11/3/12 2:22 AM, "Alan Gates" <ga...@hortonworks.com>
>> >> wrote:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>> Hello Hive community.  It is time for HCatalog to
>>graduate
>> >> from
>> >> >>>>>> the
>> >> >>>>>>>>>>>> Apache Incubator.  Given the heavy dependence of
>>HCatalog
>> >>on
>> >> >>>>>> Hive
>> >> >>>>>> the
>> >> >>>>>>>>>>>> HCatalog community agreed it made sense to explore
>> >>graduating
>> >> >>>>>> from
>> >> >>>>>>>>>>>> the
>> >> >>>>>>>>>>>> Incubator to become a subproject of Hive (see
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>
>> >>
>>http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20120
>> >> >>>>>>>>>>>> 9.
>> >> >>>>>>>>>>>> mb
>> >> >>>>>>>>>>>>
>> >>ox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com
>> >> %3E
>> >> >>>>>> and
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>
>> >>
>>http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20121
>> >> >>>>>>>>>>>> 0.
>> >> >>>>>>>>>>>> mb
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>
>> >>
>>ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gma
>> >> >>>>>>>>>>>> il
>> >> >>>>>>>>>>>> .c
>> >> >>>>>>>>>>>> om%3E ).  To help both communities understand what
>> >>HCatalog is
>> >> >>>>>> and
>> >> >>>>>>>>>>>> hopes
>> >> >>>>>>>>>>>> to become we also developed a roadmap that summarizes
>> >> HCatalog's
>> >> >>>>>>>>>>>> current
>> >> >>>>>>>>>>>> features, planned features, and other possible features
>> >>under
>> >> >>>>>>>>>>>> discussion:
>> >> >>>>>>>>>>>>
>> >> >>>>>>
>> >> https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> So we are now approaching you to see if there is
>>agreement
>> >>in
>> >> >>>>>> the
>> >> >>>>>>>>>>>> Hive
>> >> >>>>>>>>>>>> community that HCatalog graduating into Hive would make
>> >>sense.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Alan.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>
>> >> >>>
>> >> >
>> >>
>>
>>

Reply via email to