Re: Caching metastore objects

Alan Gates Tue, 26 May 2015 10:58:14 -0700

Sivaramakrishnan Narayanan <mailto:tarb...@gmail.com>
May 25, 2015 at 20:19

Apologies if this has been discussed in the past - my searches did notpullup any relevant threads. If there are better solutions available outof the

box, please let me know!

Problem statement
--------------------------

We have a setup where a single metastoredb is used by Hive, Presto and
SparkSQL. In addition, there are 1000s of hive queries submitted in batch
form from multiple machines. Oftentimes, the metastoredb ends up being
remote (in a different region in AWS etc) and round-trip latency is high.
We've seen single thrift calls getting translated into lots of small SQL

calls by datanucleus and the roundtrip latency ends up killingperformance.Furthermore, any of these systems may create / modify a hive table andthis

should be reflected in the other system. Example, I may create a table in
hive and query it using Presto or vice versa. In our setup, there may be
multiple thrift metastore servers pointing to the same metastore db.

Investigation
-------------------

Basically, we've been looking at caching to solve this problem (will come
to invalidation in a bit). I looked briefly at DN's support for caching -
these two parameters seem to be switched off by default.

METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),
METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type", "none"),

Furthermore, my reading of
http://www.datanucleus.org/products/datanucleus/jdo/cache.html suggests
that there is no sophistication in invalidation - seems like only

time-based invalidation is supported and it can't work across multiplePMFs

(therefore, multiple thrift metastore servers)

Solution Outline
-----------------------

- Every table / partition will have an additional property called
'version'
- Any call that modifies table or partition will bump up version of the
table / partition
- Guava based cache of thrift objects that come from metastore calls
- We fire a single SQL matching versions before returning from cache
- It is conceivable to have a mode wherein invalidation based on version
happens in a background thread (for higher performance, lower fidelity)
- Not proposing any locking (not shooting for world peace here :) )
- We could extend HiveMetaStore class or create a new server altogether

I think you want to do this at the ObjectStore level, not theHiveMetaStore level. Since the guava caching includes knowledge of howto fetch the item into the cache the details of how to actually thefetch the item will bleed into your caching layer. You don't want toput SQL directly into the HiveMetaStore layer since there arealternative, non-SQL implementations of that layer (see below).


Is this something that would be interesting to the community? Is this
problem already solved and should I spend my time watching GoT instead?

There is work going on to enable storing metadata in HBase instead of anRDBMS. One of the explicit goals of this work is to radically reducethe number of round trips between Hive and the metadata store. Soinstead of one thrift call resulting in many SQL calls it will result ina one or at most a few HBase calls. This may or may not solve yourproblem, and it may be much more radical a solution than you desire.Also, it isn't stable and tested yet so you would have to wait a whilefor it. But if you are interested its happening in the hbase-metastorebranch of Hive.


Alan.


Thanks
Siva

Re: Caching metastore objects

Reply via email to