Yes, a more efficient implementation is what I am trying to achieve.  I
also want to retain the ability to talk to a remote metastore that is not
necessarily thrift.

To be more precise, what I would like is a more efficient metastore.  In
looking at the current architecture, I came to a conclusion that there are
three logical boundaries where I can inject an improved implementation or
alternative to what Hive offers in the metastore space.

1) RawStore
I think the existing mechanism that Hive offers users to choose from major
RDBMSes works fine.  I suppose there's still room for improvement here, but
the impact of those improvements would be limited to the storage aspects of
metadata.

2) Thrift server
An alternative HiveMetaStore that talks Hive Metastore Thrift.  It's almost
a coin toss between this and #3, but I think for the reasons I will state
below, #3 is preferable.

3) IMetaStoreClient
I feel this gives me the most freedom since I can be embedded or remote.  I
am not tied to the Thrift interface or the RawStore interface, if I choose
to roll my own.

One thing that does concern me is your statement about IMetaStoreClient
being an internal interface, which is true.  Do the changes to this
interface really happen ad-hoc?  Doesn't it evolve in lock step with the
Thrift interface?  If so, wouldn't backward compatibility guarantees for
Thrift translate to backward compatibility guarantees for this interface as
well?  From the way it is used by Query Planning, I think it could be made
an "external" interface that belongs in hive-metastore.

On Tue, Dec 15, 2015 at 10:14 AM, Alan Gates <alanfga...@gmail.com> wrote:

> I don't see an issue with this, it seems fine.  One caveat though is we
> see this as an internal interface and we change it all the time.  I
> wouldn't want to be pushed into making backwards compatibility guarantees
> for IMetaStoreClient.  Which means that if you develop a different
> implementation of it outside Hive it will likely break on every upgrade.
>
> I don't understand your example use case.  You can run Hive now without
> the thrift server, so I'm guessing that's not what you're really trying to
> do.  Are you just interested in building a more efficient implementation or
> do you have another use case in mind?
>
> Alan.
>
> Austin Lee <austin.t....@gmail.com>
> December 14, 2015 at 20:48
> Hi,
>
> I would like to propose a change that would make it possible for users to
> choose an implementation of IMetaStoreClient via HiveConf, i.e.
> hive-site.xml. Currently, in Hive the choice is hard coded to be
> SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive.
> There is no other direct reference to SessionHiveMetaStoreClient other than
> the hard coded class name in Hive.java and the QL component operates only
> on the IMetaStoreClient interface so the change would be minimal and it
> would be quite similar to how an implementation of RawStore is specified
> and loaded in hive-metastore. One use case this change would serve would
> be one where a user wishes to use an implementation of this interface
> without the dependency on the Thrift server. I would appreciate the
> community's input and feedback on this proposal.
>
> Thank you,
> Austin
>
>

Reply via email to