Re: Performance issue with hive metastore

2020-01-31 Thread Peter Vary
Hi Nirav, I am not sure how spark uses Hive. If the ALTER TABLE sql is issued through Hive then Spark is not connecting directly to the HMS, but it connects to HS2 instead. If it is using only HMS uri, then the sql is translated inside Spark, and only metastore calls are sent to the HMS. This i

Re: Performance issue with hive metastore

2020-01-30 Thread Nirav Patel
Thanks for responding Peter. It indeed seems like a one session per client (we can see in every log record - source:10.250.70.14 ). I don't create session with hive thrift server. Spark basically require this property "hive.metastore.uris" in sparkconfig which we set to "thrift://hivebox:9083" So

Re: Performance issue with hive metastore

2020-01-30 Thread Peter Vary
Hi Nirav, There are several configurations which could affect the number of parallel queries running in your environment depending on you Hive version. Thrift client is not thread safe and this causes bottleneck in the client - HS2, and HS2 - HMS communication. Hive solves this by creating its

Performance issue with hive metastore

2020-01-29 Thread Nirav Patel
Hi, I am trying to do 1000s of update parquet partition operations on different hive tables parallely from my client application. I am using sparksql with hive enabled in my application to submit hive query. spark.sql(" ALTER TABLE mytable PART