[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556475#comment-14556475 ]
Mostafa Mokhtar commented on HIVE-10793: ---------------------------------------- [~sershe] HybridHybrid can create an arbitrary number of partitions based on data size and available memory, if we use WriteBufferSize as is we can potentially hit OOM in the constructor, which is why WriteBufferSize can't be used as is. > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > ---------------------------------------------------------------------------- > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.2.0 > Reporter: Mostafa Mokhtar > Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)