[ 
https://issues.apache.org/jira/browse/HIVE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618143#comment-16618143
 ] 

Ashutosh Chauhan commented on HIVE-20558:
-----------------------------------------

Default currently is 2. This affects initial size of hashtable and may result 
in rehashing when we need to grow hashtable. This may result in 2 issues: a) 
Perf loss while doing rehashing. b) Getting killed by LLAP oom killer since we 
double up hashtable on every rehash.
Proposal is to set this to 0.99 to avoid these problems. With 0.99 we allocate 
all the memory very first time as determined by compiler that runtime will 
need. This avoids above 2 issues since we avoid rehashing since we already 
allocate memory estimated. Downside of this is we may reserve more memory than 
needed upfront. But with recent enhancements in compiler estimates and perf 
testing this looks like a good choice.
Future enhancement will be to grow hashtables first exponentially and then 
linearly to get a good tradeoff between CPU and memory. However, we may need to 
devise new rehashing mechanisms to do this effectively. 

>  Change default of hive.hashtable.key.count.adjustment to 0.99
> --------------------------------------------------------------
>
>                 Key: HIVE-20558
>                 URL: https://issues.apache.org/jira/browse/HIVE-20558
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-20558.patch
>
>
> Current default is 2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to