[ https://issues.apache.org/jira/browse/HIVE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618143#comment-16618143 ]
Ashutosh Chauhan commented on HIVE-20558: ----------------------------------------- Default currently is 2. This affects initial size of hashtable and may result in rehashing when we need to grow hashtable. This may result in 2 issues: a) Perf loss while doing rehashing. b) Getting killed by LLAP oom killer since we double up hashtable on every rehash. Proposal is to set this to 0.99 to avoid these problems. With 0.99 we allocate all the memory very first time as determined by compiler that runtime will need. This avoids above 2 issues since we avoid rehashing since we already allocate memory estimated. Downside of this is we may reserve more memory than needed upfront. But with recent enhancements in compiler estimates and perf testing this looks like a good choice. Future enhancement will be to grow hashtables first exponentially and then linearly to get a good tradeoff between CPU and memory. However, we may need to devise new rehashing mechanisms to do this effectively. > Change default of hive.hashtable.key.count.adjustment to 0.99 > -------------------------------------------------------------- > > Key: HIVE-20558 > URL: https://issues.apache.org/jira/browse/HIVE-20558 > Project: Hive > Issue Type: Improvement > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20558.patch > > > Current default is 2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)