Thanks sungwoo Park. IMO, we should backport HIVE-21206 to branch-3.1.
From: Sungwoo Park <glap...@gmail.com> Date: Wednesday, 13 October 2021 at 12:28 PM To: user@hive.apache.org <user@hive.apache.org> Subject: Re: Hive servers restarting every few hours Hi, For 1, Hive 3.1.2 has a bug which leaks Metastore connections. This was reported in HIVE-20600: https://issues.apache.org/jira/browse/HIVE-20600<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-20600&data=04%7C01%7Cbbattula%40visa.com%7Cdc971d3498354edea85a08d98e16e2a2%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637697051219603622%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HLTd983X9t%2FWZ8sKZIAR%2Bs86gAn2pmVdXdN9fit4V5o%3D&reserved=0> You might reproduce the bug by inserting values into a table and checking the number of connections, e.g.: 0: jdbc:hive2://blue0:9852/> CREATE TABLE leak_test (id int, value string); 0: jdbc:hive2://blue0:9852/> insert into leak_test values (1, 'hello'), (2, 'world'); ... 0: jdbc:hive2://blue0:9852/> insert into leak_test values (1, 'hello'), (2, 'world'); 2021-08-09T02:15:04,263 INFO [HiveServer2-Background-Pool: Thread-250] metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 20 2021-08-09T02:15:04,269 INFO [HiveServer2-Background-Pool: Thread-250] metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 21 Applying HIVE-21206 can fix the bug: https://issues.apache.org/jira/browse/HIVE-21206<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-21206&data=04%7C01%7Cbbattula%40visa.com%7Cdc971d3498354edea85a08d98e16e2a2%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637697051219603622%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hyjiy1WOn3UxxR8xYVaJiufvTK%2FZ9QeH9gbqOP%2BDGSM%3D&reserved=0> --- Sungwoo On Mon, Oct 11, 2021 at 8:34 PM Manikaran Kathuria <kathuriamanika...@gmail.com<mailto:kathuriamanika...@gmail.com>> wrote: Hi, I hope everyone is doing good during this pandemic. I have some questions related to hive server configuration. In our current set up, we are running 6 hive server instances on k8s pods. We are using hive version 3.1.2 with Java 8. The container memory associated with each pod is 24G. We are observing that the hive servers are crashing with the OOM Java heap error. We have set the max heap size to 12G. We are using Parallel GC collectors i.e., PS Scavenge and PS MarkSweep for young gen and the old gen GCs respectively. Following are our observations- 1. The connections to hive metastore kept increasing. Before the server crashed, we have seen the number of connections to metastore as high as 1.2k. Connection leakage? 2. We have also observed that a few times the servers crashed because the container memory was full. As we have set max heap size to 12G, the servers crashing because native memory was full felt strange. On digging the process map from another instance using high native memory (chart of the memory used by hive server attached), we found that the memory was allocated to multiple 64M blocks.These 64M blocks are called arenas. We can limit the memory growth by using jemalloc instead of malloc from glibc or setting the maximum number of allowed arenas. Is it a common issue in hive servers? Any recommendations on how to solve this issue of high native memory being used? 3. Another observation, when the hive servers restarted, we found the Old gen space of heap was full but the memory committed to young gen was much lesser than the maximum memory allocated to young gen pool. To be specific about one of the instances, total heap: 12G: Old Gen memory used: 8G: Young Gen Used 360M (Committed: 708M, Max: 4G). [Chart of heap memory usage attached]. This results in consecutive full GCs before the server crashes. Should we consider using some other GC? Any recommendations or tuning suggestions? Please find the attached charts. Any help would be highly appreciated. Thanks, Manikaran Kathuria