Re: Allow other implementations of IMetaStoreClient in Hive

Austin Lee Tue, 15 Dec 2015 11:21:27 -0800

Thank you so much Alan for your prompt responses and for the information
you provided.  I will have a look at the HBase work.


I am new to the process and it's not 100% clear to me, but the wiki seems
to suggest I should use this forum to get to consensus on a proposal before
creating a JIRA ticket.  If the "why" is clear on my proposal, I would like
to create a JIRA ticket and take this through the rest of the process via
JIRA.  Does that sound good?

Thanks,
Austin

On Tue, Dec 15, 2015 at 11:04 AM, Alan Gates <alanfga...@gmail.com> wrote:

> For work along the same lines you should check out the HBase metastore
> work in Hive 2.0.  It still uses the thrift server and RawStore but puts
> HBase behind it instead of an RDBMS.  We did this because we found that
> most of the inefficiencies of Hive's metadata access had to do with the
> layout of the RDBMS and the way it was accessed.  In the same work I built
> short-circuit options in to avoid using thrift and enable sharing of
> objects across HiveMetaStore and HiveMetaStoreClient.
>
> On the backwards incompatibilities, yes IMetaStoreClient evolves in lock
> step with the thrift interface.  My point was we often add calls, add new
> fields to structs, etc.  Your code would still compile in these cases, new
> features just wouldn't work.  Given that a couple major Hadoop support
> vendors now support rolling upgrades they are devs interested in making
> sure that client version x works properly with server version x+1.
>
> Still, we don't test for the use case you are proposing so we could end up
> breaking your code without knowing it.
>
> When I said it wasn't external, I meant we did not expect end users to
> write code against it (like say the UDF interface).  Yes it's external to
> the metastore package as you point out.
>
> Alan.
>
> Austin Lee <austin.t....@gmail.com>
> December 15, 2015 at 10:46
> Yes, a more efficient implementation is what I am trying to achieve.  I
> also want to retain the ability to talk to a remote metastore that is not
> necessarily thrift.
>
> To be more precise, what I would like is a more efficient metastore.  In
> looking at the current architecture, I came to a conclusion that there are
> three logical boundaries where I can inject an improved implementation or
> alternative to what Hive offers in the metastore space.
>
> 1) RawStore
> I think the existing mechanism that Hive offers users to choose from major
> RDBMSes works fine.  I suppose there's still room for improvement here, but
> the impact of those improvements would be limited to the storage aspects of
> metadata.
>
> 2) Thrift server
> An alternative HiveMetaStore that talks Hive Metastore Thrift.  It's
> almost a coin toss between this and #3, but I think for the reasons I will
> state below, #3 is preferable.
>
> 3) IMetaStoreClient
> I feel this gives me the most freedom since I can be embedded or remote.
> I am not tied to the Thrift interface or the RawStore interface, if I
> choose to roll my own.
>
> One thing that does concern me is your statement about IMetaStoreClient
> being an internal interface, which is true.  Do the changes to this
> interface really happen ad-hoc?  Doesn't it evolve in lock step with the
> Thrift interface?  If so, wouldn't backward compatibility guarantees for
> Thrift translate to backward compatibility guarantees for this interface as
> well?  From the way it is used by Query Planning, I think it could be made
> an "external" interface that belongs in hive-metastore.
>
>
> Alan Gates <alanfga...@gmail.com>
> December 15, 2015 at 10:14
> I don't see an issue with this, it seems fine.  One caveat though is we
> see this as an internal interface and we change it all the time.  I
> wouldn't want to be pushed into making backwards compatibility guarantees
> for IMetaStoreClient.  Which means that if you develop a different
> implementation of it outside Hive it will likely break on every upgrade.
>
> I don't understand your example use case.  You can run Hive now without
> the thrift server, so I'm guessing that's not what you're really trying to
> do.  Are you just interested in building a more efficient implementation or
> do you have another use case in mind?
>
> Alan.
>
> Austin Lee <austin.t....@gmail.com>
> December 14, 2015 at 20:48
> Hi,
>
> I would like to propose a change that would make it possible for users to
> choose an implementation of IMetaStoreClient via HiveConf, i.e.
> hive-site.xml. Currently, in Hive the choice is hard coded to be
> SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive.
> There is no other direct reference to SessionHiveMetaStoreClient other than
> the hard coded class name in Hive.java and the QL component operates only
> on the IMetaStoreClient interface so the change would be minimal and it
> would be quite similar to how an implementation of RawStore is specified
> and loaded in hive-metastore. One use case this change would serve would
> be one where a user wishes to use an implementation of this interface
> without the dependency on the Thrift server. I would appreciate the
> community's input and feedback on this proposal.
>
> Thank you,
> Austin
>
>

Re: Allow other implementations of IMetaStoreClient in Hive

Reply via email to