[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez

Lefty Leverenz (JIRA) Tue, 21 Jul 2015 22:57:14 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636347#comment-14636347
 ]


Lefty Leverenz commented on HIVE-10673:
---------------------------------------

Doc note:  *hive.optimize.dynamic.partition.hashjoin* should be documented in 
the wiki.  Does it belong in the Tez section of Configuration Properties, or 
should it go in the general query execution section and just be added to the 
list of related parameters at the beginning of the Tez section?

* [Configuration Properties -- Tez | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]
 
* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Commit error:  The commit to master was mislabeled HIVE-11303: Getting Tez 
LimitExceededException after dag execution on large query (commit ID 
04d54f61c9f56906160936751e772080c079498c).  The actual HIVE-11303 has commit ID 
72f97fc7760134465333983fc40766e9e864e643.

> Dynamically partitioned hash join for Tez
> -----------------------------------------
>
>                 Key: HIVE-10673
>                 URL: https://issues.apache.org/jira/browse/HIVE-10673
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning, Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>              Labels: TODOC1.3
>             Fix For: 1.3.0, 2.0.0
>
>         Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, 
> HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, 
> HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, 
> HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch
>
>
> Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 
> 2/3 of the CPU was spent during sorting/merging.
> While this does not work for MR, for other execution engines (such as Tez), 
> it is possible to create a reduce-side join that uses unsorted inputs in 
> order to eliminate the sorting, which may be faster than a shuffle join. To 
> join on unsorted inputs, we can use the hash join algorithm to perform the 
> join in the reducer. This will require the small tables in the join to fit in 
> the reducer/hash table for this to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez

Reply via email to