Merging HCatalog into Hive

Alan Gates Fri, 22 Feb 2013 17:13:04 -0800

Alright, our vote has passed, it's time to get on with merging HCatalog into 
Hive.  Here's the things I can think of we need to deal with.  Please add 
additional issues I've missed:


1) Moving the code
2) Dealing with domain names in the code
3) The mailing lists
4) The JIRA
5) The website
6) Committer rights
7) Make a proposal for how HCat is released going forward
8) Publish an FAQ 

Proposals for how we handle these:
Below I propose an approach for how to handle each of these.  Feedback welcome.

1) Moving the code
I propose that HCat move into a subdirectory of Hive.  This fits nicely into 
Hive's structure since it already has metastore, ql, etc.  We'd just add 
'hcatalog' as a new directory.  This directory would contain hcatalog as it is 
today.  It does not follow Hive's standard build model so we'd need to do some 
work to make it so that building Hive also builds HCat, but this should be 
minimal.

2) Dealing with domain names
HCat code currently is under org.apache.hcatalog.  Do we want to change it?  In 
time we probably should change it to match the rest of Hive 
(org.apache.hadoop.hive.hcatalog).  We need to do this in a backward compatible 
way.  I propose we leave it as is for now and if we decide to in the future we 
can move the actual code to org.apache.hadoop.hive.hcatalog and create shell 
classes under org.apache.hcatalog.

3) The mailing lists
Given that our goal is to merge the projects and not create a subproject we 
should merge the mailing lists rather than keep hcat specific lists.  We can 
ask infra to remove hcatalog-*@incubator.apache.org and forward any new mail to 
the appropriate Hive lists.  We need to find out if they can auto-subscribe 
people from the hcat lists to the hive lists.  Given that traffic on the Hive 
lists is an order of magnitude higher we should warn people before we 
auto-subscribe them and allow them a chance to get off.

4) JIRA
We can create an hcatalog component in Hive's JIRA.  All new HCat issues could 
be filed there.  I don't know if there's a way to upload existing JIRAs into 
Hive's JIRA, but I think it would be better to leave them where they are.  We 
should see if infra can turn off the ability to create new JIRAs in hcatalog.

5) Website
We will need to integrate HCatalog's website with Hive's.  This should be easy 
except for the documentation.  HCat uses forrest for docs, Hive uses wiki.  We 
will need to put links under 'Documentation' for older versions of HCat docs so 
users can find them.  As far as how docs are handled for the next version of 
HCatalog, I think that depends on the answer to question 7 (next release of 
HCat), but I propose that HCat needs to conform to the way Hive does docs on 
wiki.  Though I would strongly encourage the HCat docs to be version specific 
(that is, have a set of wiki pages for each version).  
incubator.apache.org/hcatalog should be changed to forward to hive.apache.org.

6) Committer rights
Carl will need to set up committer rights for all the new HCat committers.  
Based on our discussion of making active HCat committers Hive submodule 
committers this would add the following set:  Alan, Sushanth, Francis, Daniel, 
Vandana, Travis, and Mithun.  Ashutosh and Paul are already Hive committers, 
and neither Devaraj nor Mac have been active in HCat in over a year.

7) Future releases
We need to discuss how future releases will happen, as I think this will help 
developers and users know how to respond to the merge.  I propose that HCat 
will simply become part of future Hive releases.  Thus Hive 0.11 (or whatever 
the next major release is) will include HCatalog.  If there are issues found we 
may need to make HCatalog 0.5.x releases from Hive, which should be fine.  But 
I propose there would not be an HCat 0.6.  To be clear I am not proposing that 
HCat functionality would be subsumed into Hive jars.  Just that the existing 
hcat jars would become part of Hive's release.

8) Communicate all of this
We should put up an FAQ page that has this information, as well as tracks our 
progress while we work on getting these things done.  

Alan.

Merging HCatalog into Hive

Reply via email to