[ https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandeep Gade updated HIVE-26669: -------------------------------- Description: We are experiencing issues with Hive Metastore where it goes unresponsive. Initial investigation shows thousands of thread in WAITING (parking) state as shown below: 1 java.lang.Thread.State: BLOCKED (on object monitor) 772 java.lang.Thread.State: RUNNABLE 2 java.lang.Thread.State: TIMED_WAITING (on object monitor) 13 java.lang.Thread.State: TIMED_WAITING (parking) 5 java.lang.Thread.State: TIMED_WAITING (sleeping) 3 java.lang.Thread.State: WAITING (on object monitor) 14308 java.lang.Thread.State: WAITING (parking) ============== Almost all of the threads are stuck at 'parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)' 15 - parking to wait for <0x00007f9ad06c9c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 14288 - parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) 1 - parking to wait for <0x00007f9ad0a161f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0a39248> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0adb0a0> (a java.util.concurrent.SynchronousQueue$TransferQueue) 5 - parking to wait for <0x00007f9ad0b12278> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0b12518> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0b44878> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0cbe8f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad1318d60> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad1478c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 5 - parking to wait for <0x00007f9ad1494ff8> (a java.util.concurrent.SynchronousQueue$TransferQueue) ====================== complete stack: "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800 nid=0x62011 waiting on condition [0x00007f959d917000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy30.get_database(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Locked ownable synchronizers: - <0x00007fae9f0d8c20> (a java.util.concurrent.ThreadPoolExecutor$Worker) ====================== Looking at linux process, Hive exhausts its 'max processes count' while the issue is happening set to: Max processes 16000 16000 processes As a workaround, we restart Metastores and it works fine for few days. was: We are experiencing issues with Hive Metastore where it goes unresponsive. Initial investigation shows thousands of thread in WAITING (parking) state as shown below: 1 java.lang.Thread.State: BLOCKED (on object monitor) 772 java.lang.Thread.State: RUNNABLE 2 java.lang.Thread.State: TIMED_WAITING (on object monitor) 13 java.lang.Thread.State: TIMED_WAITING (parking) 5 java.lang.Thread.State: TIMED_WAITING (sleeping) 3 java.lang.Thread.State: WAITING (on object monitor) 14308 java.lang.Thread.State: WAITING (parking) ============== All most all of the threads are stuck at 'parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)' 15 - parking to wait for <0x00007f9ad06c9c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 14288 - parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) 1 - parking to wait for <0x00007f9ad0a161f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0a39248> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0adb0a0> (a java.util.concurrent.SynchronousQueue$TransferQueue) 5 - parking to wait for <0x00007f9ad0b12278> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0b12518> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0b44878> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad0cbe8f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad1318d60> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x00007f9ad1478c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 5 - parking to wait for <0x00007f9ad1494ff8> (a java.util.concurrent.SynchronousQueue$TransferQueue) ====================== complete stack: "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800 nid=0x62011 waiting on condition [0x00007f959d917000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy30.get_database(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Locked ownable synchronizers: - <0x00007fae9f0d8c20> (a java.util.concurrent.ThreadPoolExecutor$Worker) ====================== Looking at linux process, Hive exhausts its 'max processes count' while the issue is happening set to: Max processes 16000 16000 processes As a workaround, we restart Metastores and it works fine for few days. > Hive Metastore become unresponsive > ---------------------------------- > > Key: HIVE-26669 > URL: https://issues.apache.org/jira/browse/HIVE-26669 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 3.1.0 > Reporter: Sandeep Gade > Priority: Critical > > We are experiencing issues with Hive Metastore where it goes unresponsive. > Initial investigation shows thousands of thread in WAITING (parking) state as > shown below: > 1 java.lang.Thread.State: BLOCKED (on object monitor) > 772 java.lang.Thread.State: RUNNABLE > 2 java.lang.Thread.State: TIMED_WAITING (on object monitor) > 13 java.lang.Thread.State: TIMED_WAITING (parking) > 5 java.lang.Thread.State: TIMED_WAITING (sleeping) > 3 java.lang.Thread.State: WAITING (on object monitor) > 14308 java.lang.Thread.State: WAITING (parking) > ============== > Almost all of the threads are stuck at 'parking to wait for > <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)' > > 15 - parking to wait for <0x00007f9ad06c9c10> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 14288 - parking to wait for <0x00007f9ad0795c48> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > 1 - parking to wait for <0x00007f9ad0a161f8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0a39248> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0adb0a0> (a > java.util.concurrent.SynchronousQueue$TransferQueue) > 5 - parking to wait for <0x00007f9ad0b12278> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0b12518> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0b44878> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad0cbe8f0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad1318d60> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 1 - parking to wait for <0x00007f9ad1478c10> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 5 - parking to wait for <0x00007f9ad1494ff8> (a > java.util.concurrent.SynchronousQueue$TransferQueue) > ====================== > complete stack: > "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800 > nid=0x62011 waiting on condition [0x00007f959d917000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007f9ad0795c48> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470) > at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy30.get_database(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Locked ownable synchronizers: > - <0x00007fae9f0d8c20> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > ====================== > Looking at linux process, Hive exhausts its 'max processes count' while the > issue is happening > set to: > Max processes 16000 16000 processes > As a workaround, we restart Metastores and it works fine for few days. -- This message was sent by Atlassian Jira (v8.20.10#820010)