Re: Allow other implementations of IMetaStoreClient in Hive

Alan Gates Tue, 15 Dec 2015 12:50:22 -0800

I think opening a JIRA is a good next step.

Alan.

Austin Lee <mailto:austin.t....@gmail.com>
December 15, 2015 at 11:19
Thank you so much Alan for your prompt responses and for theinformation you provided. I will have a look at the HBase work.
I am new to the process and it's not 100% clear to me, but the wikiseems to suggest I should use this forum to get to consensus on aproposal before creating a JIRA ticket. If the "why" is clear on myproposal, I would like to create a JIRA ticket and take this throughthe rest of the process via JIRA. Does that sound good?
Thanks,
Austin


Alan Gates <mailto:alanfga...@gmail.com>
December 15, 2015 at 11:04
For work along the same lines you should check out the HBase metastorework in Hive 2.0. It still uses the thrift server and RawStore butputs HBase behind it instead of an RDBMS. We did this because wefound that most of the inefficiencies of Hive's metadata access had todo with the layout of the RDBMS and the way it was accessed. In thesame work I built short-circuit options in to avoid using thrift andenable sharing of objects across HiveMetaStore and HiveMetaStoreClient.
On the backwards incompatibilities, yes IMetaStoreClient evolves inlock step with the thrift interface. My point was we often add calls,add new fields to structs, etc. Your code would still compile inthese cases, new features just wouldn't work. Given that a couplemajor Hadoop support vendors now support rolling upgrades they aredevs interested in making sure that client version x works properlywith server version x+1.
Still, we don't test for the use case you are proposing so we couldend up breaking your code without knowing it.
When I said it wasn't external, I meant we did not expect end users towrite code against it (like say the UDF interface). Yes it's externalto the metastore package as you point out.
Alan.

Austin Lee <mailto:austin.t....@gmail.com>
December 15, 2015 at 10:46
Yes, a more efficient implementation is what I am trying to achieve.I also want to retain the ability to talk to a remote metastore thatis not necessarily thrift.
To be more precise, what I would like is a more efficient metastore.In looking at the current architecture, I came to a conclusion thatthere are three logical boundaries where I can inject an improvedimplementation or alternative to what Hive offers in the metastore space.
1) RawStore
I think the existing mechanism that Hive offers users to choose frommajor RDBMSes works fine. I suppose there's still room forimprovement here, but the impact of those improvements would belimited to the storage aspects of metadata.
2) Thrift server
An alternative HiveMetaStore that talks Hive Metastore Thrift. It'salmost a coin toss between this and #3, but I think for the reasons Iwill state below, #3 is preferable.
3) IMetaStoreClient
I feel this gives me the most freedom since I can be embedded orremote. I am not tied to the Thrift interface or the RawStoreinterface, if I choose to roll my own.
One thing that does concern me is your statement aboutIMetaStoreClient being an internal interface, which is true. Do thechanges to this interface really happen ad-hoc? Doesn't it evolve inlock step with the Thrift interface? If so, wouldn't backwardcompatibility guarantees for Thrift translate to backwardcompatibility guarantees for this interface as well? From the way itis used by Query Planning, I think it could be made an "external"interface that belongs in hive-metastore.
Alan Gates <mailto:alanfga...@gmail.com>
December 15, 2015 at 10:14
I don't see an issue with this, it seems fine. One caveat though iswe see this as an internal interface and we change it all the time. Iwouldn't want to be pushed into making backwards compatibilityguarantees for IMetaStoreClient. Which means that if you develop adifferent implementation of it outside Hive it will likely break onevery upgrade.
I don't understand your example use case. You can run Hive nowwithout the thrift server, so I'm guessing that's not what you'rereally trying to do. Are you just interested in building a moreefficient implementation or do you have another use case in mind?
Alan.

Austin Lee <mailto:austin.t....@gmail.com>
December 14, 2015 at 20:48
Hi,

I would like to propose a change that would make it possible for users to
choose an implementation of IMetaStoreClient via HiveConf, i.e.
hive-site.xml. Currently, in Hive the choice is hard coded to be
SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive.
There is no other direct reference to SessionHiveMetaStoreClient otherthan
the hard coded class name in Hive.java and the QL component operates only
on the IMetaStoreClient interface so the change would be minimal and it
would be quite similar to how an implementation of RawStore is specified
and loaded in hive-metastore. One use case this change would serve would
be one where a user wishes to use an implementation of this interface
without the dependency on the Thrift server. I would appreciate the
community's input and feedback on this proposal.

Thank you,
Austin

Re: Allow other implementations of IMetaStoreClient in Hive

Reply via email to