[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289035#comment-14289035 ]
Lefty Leverenz commented on HIVE-9277: -------------------------------------- [~wzheng] put the design doc on the wiki here: [Hybrid Hybrid Grace Hash Join, v1.0 | https://cwiki.apache.org/confluence/display/Hive/Hybrid+Hybrid+Grace+Hash+Join,+v1.0]. _Review comment:_ The final graphic in "Recursive Hashing and Spilling" says ... bq. Now we probe using Matchfile 1 against HT 3. Matching values go into result. Non-matching values go to Matchfile 4. ... but it shows non-matching values from HT4, not HT3, going to Matchfile4. A dashed line from HT3 to Matchfile4 is missing. And should the text say "probe using Matchfile 1 against HT3 and HT4 (if it fits in memory)"? > Hybrid Hybrid Grace Hash Join > ----------------------------- > > Key: HIVE-9277 > URL: https://issues.apache.org/jira/browse/HIVE-9277 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer > Reporter: Wei Zheng > Assignee: Wei Zheng > Labels: join > Attachments: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf > > > We are proposing an enhanced hash join algorithm called “hybrid hybrid grace > hash join”. We can benefit from this feature as illustrated below: > o The query will not fail even if the estimated memory requirement is > slightly wrong > o Expensive garbage collection overhead can be avoided when hash table grows > o Join execution using a Map join operator even though the small table > doesn't fit in memory as spilling some data from the build and probe sides > will still be cheaper than having to shuffle the large fact table > The design was based on Hadoop’s parallel processing capability and > significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)