[ https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336360#comment-15336360 ]
Wei Zheng commented on HIVE-13809: ---------------------------------- list_bucket_dml_12 failure is irrelevant. It passes locally. > Hybrid Grace Hash Join memory usage estimation didn't take into account the > bloom filter size > --------------------------------------------------------------------------------------------- > > Key: HIVE-13809 > URL: https://issues.apache.org/jira/browse/HIVE-13809 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.0.0, 2.1.0 > Reporter: Wei Zheng > Assignee: Wei Zheng > Attachments: HIVE-13809.1.patch > > > Memory estimation is important during hash table loading, because we need to > make the decision of whether to load the next hash partition in memory or > spill it. If the assumption is there's enough memory but it turns out not the > case, we will run into OOM problem. > Currently hybrid grace hash join memory usage estimation didn't take into > account the bloom filter size. In large test cases (TB scale) the bloom > filter grows as big as hundreds of MB, big enough to cause estimation error. > The solution is to count in the bloom filter size into memory estimation. > Another issue this patch will fix is possible NPE due to object cache reuse > during hybrid grace hash join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)