Re: Hive Metastore Bottleneck

2016-03-30 Thread Jörn Franke
Is the MySQL database virtualized? Bottlenecks to storage of the MySQL database? Network could be a bottleneck? Firewalls blocking new connections in case of a sudden connection increase? > On 30 Mar 2016, at 23:28, Udit Mehta wrote: > > Hi all, > > We are currently running Hive in productio

Re: Hive Metastore Bottleneck

2016-03-30 Thread Gautam
Mich, Thanks for clarifying. My words were probably misleading :-) Udit, Fallback (or HA) is different from load balancing. Having load issues need not mean it is unreachable, so your second service. You want equal load on both metastore services so a simple Round Robin load balancer sho

Re: Hive Metastore Bottleneck

2016-03-30 Thread Udit Mehta
But dont the clients always pick the first URI for multiple instances mentioned in "*hive.metastore.uris" *config and fallback to the others only if the first is unreachable? This way, we would still have a bottleneck, right? Can you give a little more information on your setup and how you enable l

Re: Hive Metastore Bottleneck

2016-03-30 Thread Mich Talebzadeh
Hi Gautam When you stated "Have you tried putting multiple metastores behind a load balancer" Should read "Have you tried putting multiple *metastore services* Basically there is only one backend database AKA metastore. Unless you have set up bi-directional replication on the database, t

Re: Hive Metastore Bottleneck

2016-03-30 Thread Gautam
The metastore service is a java process that is a thrift server .. so you can point multiple such hive metastore instances with "javax.jdo.option.ConnectionURL" poitning to the same mysql db. On Wed, Mar 30, 2016 at 3:11 PM, Mich Talebzadeh wrote: > > > Can you clarify this please > > "Have you

Re: Hive Metastore Bottleneck

2016-03-30 Thread Mich Talebzadeh
Can you clarify this please "Have you tried putting multiple metastores behind a load balancer" Are you implying that metastore and backend DB are different entities here. As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive threads to the backend database/metastore and Hive se

Re: Hive Metastore Bottleneck

2016-03-30 Thread Udit Mehta
I was looking at : *hive**.metastore.max.server.threads *but reading more into it tells me its a config for the thrift server and not the metastore. Most of our applications accessing the metastore are Spark Sql applications which do INSERT operations on multiple partitions on a hourly basis. This

Re: Hive Metastore Bottleneck

2016-03-30 Thread Gautam
Can you elaborate on where you see the bottleneck? A general overview of your access path would be useful. For instance if you'r accessing Hive metastore via HiveServer2 or from webhcat using embedded cli or something else. Have you tried putting multiple metastores behind a load balancer? It's

Re: Hive Metastore Bottleneck

2016-03-30 Thread Mich Talebzadeh
Are you talking about increase in number of threads from Hive server2 connection to your database (MySQL)? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Hive Metastore Bottleneck

2016-03-30 Thread Udit Mehta
Hi all, We are currently running Hive in production and staging with the metastore connecting to a MySql database in the backend. The traffic in production accessing the metastore is more than staging which is expected. We have had a sudden increase in traffic which has led to the metastore operat