[jira] [Commented] (HIVE-2095) auto convert map join bug

He Yongqiang (JIRA) Thu, 07 Apr 2011 22:27:50 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017291#comment-13017291
 ]


He Yongqiang commented on HIVE-2095:
------------------------------------

Uploading a new patch to address namit's comments.

Note, there is an existing bug in hive that cause results of auto_join29.q is 
not correct. 
Let's file another jira for it.
basically, if the outer join filter is enabled, the query "SELECT 
/*+mapjoin(src1, src2)*/ * FROM src src1 RIGHT OUTER JOIN src src2 ON (src1.key 
= src2.key AND src1.key < 10 AND src2.key > 10) JOIN src src3 ON (src2.key = 
src3.key AND src3.key < 10) SORT BY src1.key, src1.value, src2.key, src2.value, 
src3.key, src3.value;" will give wrong results in today's hive.

> auto convert map join bug
> -------------------------
>
>                 Key: HIVE-2095
>                 URL: https://issues.apache.org/jira/browse/HIVE-2095
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2095) auto convert map join bug

Reply via email to