[ 
https://issues.apache.org/jira/browse/HIVE-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886799#comment-15886799
 ] 

Sergey Shelukhin edited comment on HIVE-16040 at 2/28/17 12:00 AM:
-------------------------------------------------------------------

[~ashutoshc] see the TODO that this patch removes.
The way Hive structures the unions/etc w/more than 2 sides is an unbalanced 
binary tree sortof, where the last query of the union is the right child, and 
all the others are the left branch, and so on recursively.
E.g.
{noformat}
TOK_UNION
  TOK_UNION
      TOK_QUERY
      TOK_QUERY
  TOK_QUERY
{noformat}
  However, that means that the first query of the union, that Hive uses to get 
column aliases, is not the one that the original patch was looking at - it was 
looking at the least nested select, which would be the rightmost (the last) 
query of the union.

So what we do is find the parent of the tree and, if we are not the left-most 
child, go into the left (first) sub-tree and find select again, then repeat. 
For a 2-query union, it will find the correct select immediately.
It's a little bit wasteful (because in multi-union we'd always find the wrong 
select, then backtrack to find the left side, then find the wrong select inside 
the left side, then backtrack again one level lower, etc. until we get to the 
level where both children are on the same level) but it should protect against 
finding selects in unexpected places like subquery expressions, etc.


was (Author: sershe):
[~ashutoshc] see the TODO that this patch removes.
The way Hive structures the unions/etc w/more than 3 sides is an unbalanced 
binary tree sortof, where the last query of the union is the right child, and 
all the others are the left branch, and so on recursively.
E.g.
{noformat}
TOK_UNION
  TOK_UNION
      TOK_QUERY
      TOK_QUERY
  TOK_QUERY
{noformat}
  However, that means that the first query of the union, that Hive uses to get 
column aliases, is not the one that the original patch was looking at - it was 
looking at the least nested select, which would be the rightmost (the last) 
query of the union.

So what we do is find the parent of the tree and, if we are not the left-most 
child, go into the left (first) sub-tree and find select again, then repeat. 
For a 2-query union, it will find the correct select immediately.
It's a little bit wasteful (because in multi-union we'd always find the wrong 
select, then backtrack to find the left side, then find the wrong select inside 
the left side, then backtrack again one level lower, etc. until we get to the 
level where both children are on the same level) but it should protect against 
finding selects in unexpected places like subquery expressions, etc.

> union column expansion should take aliases from the leftmost branch
> -------------------------------------------------------------------
>
>                 Key: HIVE-16040
>                 URL: https://issues.apache.org/jira/browse/HIVE-16040
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-16040.01.patch, HIVE-16040.02.patch, 
> HIVE-16040.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to