[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289035#comment-14289035
 ] 

Lefty Leverenz commented on HIVE-9277:
--------------------------------------

[~wzheng] put the design doc on the wiki here:  [Hybrid Hybrid Grace Hash Join, 
v1.0 | 
https://cwiki.apache.org/confluence/display/Hive/Hybrid+Hybrid+Grace+Hash+Join,+v1.0].

_Review comment:_  The final graphic in "Recursive Hashing and Spilling" says 
...

bq.  Now we probe using Matchfile 1 against HT 3. Matching values go into 
result. Non-matching values go to Matchfile 4.

... but it shows non-matching values from HT4, not HT3, going to Matchfile4.  A 
dashed line from HT3 to Matchfile4 is missing.  And should the text say "probe 
using Matchfile 1 against HT3 and HT4 (if it fits in memory)"?

> Hybrid Hybrid Grace Hash Join
> -----------------------------
>
>                 Key: HIVE-9277
>                 URL: https://issues.apache.org/jira/browse/HIVE-9277
>             Project: Hive
>          Issue Type: New Feature
>          Components: Physical Optimizer
>            Reporter: Wei Zheng
>            Assignee: Wei Zheng
>              Labels: join
>         Attachments: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called “hybrid hybrid grace 
> hash join”. We can benefit from this feature as illustrated below:
> o The query will not fail even if the estimated memory requirement is 
> slightly wrong
> o Expensive garbage collection overhead can be avoided when hash table grows
> o Join execution using a Map join operator even though the small table 
> doesn't fit in memory as spilling some data from the build and probe sides 
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and 
> significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to