Thank you so much Alan for your prompt responses and for the information you provided. I will have a look at the HBase work.
I am new to the process and it's not 100% clear to me, but the wiki seems to suggest I should use this forum to get to consensus on a proposal before creating a JIRA ticket. If the "why" is clear on my proposal, I would like to create a JIRA ticket and take this through the rest of the process via JIRA. Does that sound good? Thanks, Austin On Tue, Dec 15, 2015 at 11:04 AM, Alan Gates <alanfga...@gmail.com> wrote: > For work along the same lines you should check out the HBase metastore > work in Hive 2.0. It still uses the thrift server and RawStore but puts > HBase behind it instead of an RDBMS. We did this because we found that > most of the inefficiencies of Hive's metadata access had to do with the > layout of the RDBMS and the way it was accessed. In the same work I built > short-circuit options in to avoid using thrift and enable sharing of > objects across HiveMetaStore and HiveMetaStoreClient. > > On the backwards incompatibilities, yes IMetaStoreClient evolves in lock > step with the thrift interface. My point was we often add calls, add new > fields to structs, etc. Your code would still compile in these cases, new > features just wouldn't work. Given that a couple major Hadoop support > vendors now support rolling upgrades they are devs interested in making > sure that client version x works properly with server version x+1. > > Still, we don't test for the use case you are proposing so we could end up > breaking your code without knowing it. > > When I said it wasn't external, I meant we did not expect end users to > write code against it (like say the UDF interface). Yes it's external to > the metastore package as you point out. > > Alan. > > Austin Lee <austin.t....@gmail.com> > December 15, 2015 at 10:46 > Yes, a more efficient implementation is what I am trying to achieve. I > also want to retain the ability to talk to a remote metastore that is not > necessarily thrift. > > To be more precise, what I would like is a more efficient metastore. In > looking at the current architecture, I came to a conclusion that there are > three logical boundaries where I can inject an improved implementation or > alternative to what Hive offers in the metastore space. > > 1) RawStore > I think the existing mechanism that Hive offers users to choose from major > RDBMSes works fine. I suppose there's still room for improvement here, but > the impact of those improvements would be limited to the storage aspects of > metadata. > > 2) Thrift server > An alternative HiveMetaStore that talks Hive Metastore Thrift. It's > almost a coin toss between this and #3, but I think for the reasons I will > state below, #3 is preferable. > > 3) IMetaStoreClient > I feel this gives me the most freedom since I can be embedded or remote. > I am not tied to the Thrift interface or the RawStore interface, if I > choose to roll my own. > > One thing that does concern me is your statement about IMetaStoreClient > being an internal interface, which is true. Do the changes to this > interface really happen ad-hoc? Doesn't it evolve in lock step with the > Thrift interface? If so, wouldn't backward compatibility guarantees for > Thrift translate to backward compatibility guarantees for this interface as > well? From the way it is used by Query Planning, I think it could be made > an "external" interface that belongs in hive-metastore. > > > Alan Gates <alanfga...@gmail.com> > December 15, 2015 at 10:14 > I don't see an issue with this, it seems fine. One caveat though is we > see this as an internal interface and we change it all the time. I > wouldn't want to be pushed into making backwards compatibility guarantees > for IMetaStoreClient. Which means that if you develop a different > implementation of it outside Hive it will likely break on every upgrade. > > I don't understand your example use case. You can run Hive now without > the thrift server, so I'm guessing that's not what you're really trying to > do. Are you just interested in building a more efficient implementation or > do you have another use case in mind? > > Alan. > > Austin Lee <austin.t....@gmail.com> > December 14, 2015 at 20:48 > Hi, > > I would like to propose a change that would make it possible for users to > choose an implementation of IMetaStoreClient via HiveConf, i.e. > hive-site.xml. Currently, in Hive the choice is hard coded to be > SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive. > There is no other direct reference to SessionHiveMetaStoreClient other than > the hard coded class name in Hive.java and the QL component operates only > on the IMetaStoreClient interface so the change would be minimal and it > would be quite similar to how an implementation of RawStore is specified > and loaded in hive-metastore. One use case this change would serve would > be one where a user wishes to use an implementation of this interface > without the dependency on the Thrift server. I would appreciate the > community's input and feedback on this proposal. > > Thank you, > Austin > >