Thanks sungwoo Park.

IMO, we should backport HIVE-21206 to branch-3.1.


From: Sungwoo Park <glap...@gmail.com>
Date: Wednesday, 13 October 2021 at 12:28 PM
To: user@hive.apache.org <user@hive.apache.org>
Subject: Re: Hive servers restarting every few hours
Hi,

For 1, Hive 3.1.2 has a bug which leaks Metastore connections. This was 
reported in HIVE-20600:

https://issues.apache.org/jira/browse/HIVE-20600<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-20600&data=04%7C01%7Cbbattula%40visa.com%7Cdc971d3498354edea85a08d98e16e2a2%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637697051219603622%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HLTd983X9t%2FWZ8sKZIAR%2Bs86gAn2pmVdXdN9fit4V5o%3D&reserved=0>

You might reproduce the bug by inserting values into a table and checking the 
number of connections, e.g.:
0: jdbc:hive2://blue0:9852/> CREATE TABLE leak_test (id int, value string);
0: jdbc:hive2://blue0:9852/> insert into leak_test values (1, 'hello'), (2, 
'world');
...
0: jdbc:hive2://blue0:9852/> insert into leak_test values (1, 'hello'), (2, 
'world');

2021-08-09T02:15:04,263  INFO [HiveServer2-Background-Pool: Thread-250] 
metastore.HiveMetaStoreClient: Closed a connection to metastore, current 
connections: 20
2021-08-09T02:15:04,269  INFO [HiveServer2-Background-Pool: Thread-250] 
metastore.HiveMetaStoreClient: Opened a connection to metastore, current 
connections: 21

Applying HIVE-21206 can fix the bug:

https://issues.apache.org/jira/browse/HIVE-21206<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-21206&data=04%7C01%7Cbbattula%40visa.com%7Cdc971d3498354edea85a08d98e16e2a2%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637697051219603622%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hyjiy1WOn3UxxR8xYVaJiufvTK%2FZ9QeH9gbqOP%2BDGSM%3D&reserved=0>

--- Sungwoo


On Mon, Oct 11, 2021 at 8:34 PM Manikaran Kathuria 
<kathuriamanika...@gmail.com<mailto:kathuriamanika...@gmail.com>> wrote:
Hi,
I hope everyone is doing good during this pandemic. I have some questions 
related to hive server configuration. In our current set up, we are running 6 
hive server instances on k8s pods. We are using hive version 3.1.2 with Java 8. 
The container memory associated with each pod is 24G. We are observing that the 
hive servers are crashing with the OOM Java heap error. We have set the max 
heap size to 12G. We are using Parallel GC collectors i.e., PS Scavenge and PS 
MarkSweep for young gen and the old gen GCs respectively. Following are our 
observations-
1. The connections to hive metastore kept increasing. Before the server 
crashed, we have seen the number of connections to metastore as high as 1.2k. 
Connection leakage?
2. We have also observed that a few times the servers crashed because the 
container memory was full. As we have set max heap size to 12G, the servers 
crashing because native memory was full felt strange. On digging the process 
map from another instance using high native memory (chart of the memory used by 
hive server attached), we found that the memory was allocated to multiple 64M 
blocks.These 64M blocks are called arenas. We can limit the memory growth by 
using jemalloc instead of malloc from glibc or setting the maximum number of 
allowed arenas. Is it a common issue in hive servers? Any recommendations on 
how to solve this issue of high native memory being used?
3. Another observation, when the hive servers restarted, we found the Old gen 
space of heap was full but the memory committed to young gen was much lesser 
than the maximum memory allocated to young gen pool. To be specific about one 
of the instances, total heap: 12G: Old Gen memory used: 8G: Young Gen Used 360M 
(Committed: 708M, Max: 4G). [Chart of heap memory usage attached]. This results 
in consecutive full GCs before the server crashes. Should we consider using 
some other GC? Any recommendations or tuning suggestions?
Please find the attached charts.
Any help would be highly appreciated.

Thanks,
Manikaran Kathuria

Reply via email to