[ https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624727#comment-17624727 ]
Chris Nauroth commented on HIVE-26669: -------------------------------------- Thank you for sharing the additional information. The other thread is blocked attempting to check out a database connection from the Hikari connection pool. This most likely indicates that other threads have checked out all available connections and are using them for long-running operations. It could be that you could avoid the problem by tuning up the maximum number of connections allowed in the pool, using hive-site.xml property {{datanucleus.connectionPool.maxPoolSize}}. The default value is 10, documented here: https://cwiki.apache.org/confluence/display/hive/configuration+properties Alternatively, it might mean that something else is wrong, and tuning up the connection pool size would just delay the problem. Consider looking in the full thread dump to find out what other database operations are happening. Does it look like those connections are hanging indefinitely? Does the database itself appear to be overloaded? > Hive Metastore become unresponsive > ---------------------------------- > > Key: HIVE-26669 > URL: https://issues.apache.org/jira/browse/HIVE-26669 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 3.1.0 > Reporter: Sandeep Gade > Priority: Critical > > We are experiencing issues with Hive Metastore where it goes unresponsive. > Initial investigation shows thousands of thread in WAITING (parking) state as > shown below: > 1 java.lang.Thread.State: BLOCKED (on object monitor) > 772 java.lang.Thread.State: RUNNABLE > 2 java.lang.Thread.State: TIMED_WAITING (on object monitor) > 13 java.lang.Thread.State: TIMED_WAITING (parking) > 5 java.lang.Thread.State: TIMED_WAITING (sleeping) > 3 java.lang.Thread.State: WAITING (on object monitor) > 14308 java.lang.Thread.State: WAITING (parking) > ============== > Almost all of the threads are stuck at 'parking to wait for > <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)' > > 15 - parking to wait for <0x00007f9ad06c9c10> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 14288 - parking to wait for <0x00007f9ad0795c48> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > 1 - parking to wait for <0x00007f9ad0a161f8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0a39248> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0adb0a0> (a > java.util.concurrent.SynchronousQueue$TransferQueue) > 5 - parking to wait for <0x00007f9ad0b12278> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0b12518> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0b44878> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0cbe8f0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad1318d60> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad1478c10> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 5 - parking to wait for <0x00007f9ad1494ff8> (a > java.util.concurrent.SynchronousQueue$TransferQueue) > ====================== > complete stack: > "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800 > nid=0x62011 waiting on condition [0x00007f959d917000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007f9ad0795c48> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470) > at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy30.get_database(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Locked ownable synchronizers: > - <0x00007fae9f0d8c20> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > ====================== > Looking at linux process, Hive exhausts its 'max processes count' while the > issue is happening > set to: > Max processes 16000 16000 processes > As a workaround, we restart Metastores and it works fine for few days. -- This message was sent by Atlassian Jira (v8.20.10#820010)