[ 
https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2382:
---------------------------------

    Status: Open  (was: Patch Available)

@Charles: Please make sure the patch applies cleanly with 'patch -p0'. Thanks!


> Invalid predicate pushdown from incorrect column expression map for select 
> operator generated by GROUP BY operation
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2382
>                 URL: https://issues.apache.org/jira/browse/HIVE-2382
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Charles Chen
>            Assignee: Charles Chen
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2382v1.patch
>
>
> When a GROUP BY is specified, a select operator is added before the GROUP BY 
> in SemanticAnalyzer.insertSelectAllPlanForGroupBy.  Currently, the column 
> expression map for this is set to the column expression map for the parent 
> operator.  This behavior is incorrect as, for example, the parent operator 
> could rearrange the order of the columns (_col0 => _col0, _col1 => _col2, 
> _col2 => _col1) and the new operator should not repeat this.
> The predicate pushdown optimization uses the column expression map to track 
> which columns a filter expression refers to at different operators.  This 
> results in a filter on incorrect columns.
> Here is a simple case of this going wrong: Using
> {noformat}
> create table invites (id int, foo int, bar int);
> {noformat}
> executing the query
> {noformat}
> explain select * from (select foo, bar from (select bar, foo from invites c 
> union all select bar, foo from invites d) b) a group by bar, foo having bar=1;
> {noformat}
> results in
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a-subquery1:b-subquery1:c 
>           TableScan
>             alias: c
>             Filter Operator
>               predicate:
>                   expr: (foo = 1)
>                   type: boolean
>               Select Operator
>                 expressions:
>                       expr: bar
>                       type: int
>                       expr: foo
>                       type: int
>                 outputColumnNames: _col0, _col1
>                 Union
>                   Select Operator
>                     expressions:
>                           expr: _col1
>                           type: int
>                           expr: _col0
>                           type: int
>                     outputColumnNames: _col0, _col1
>                     Select Operator
>                       expressions:
>                             expr: _col0
>                             type: int
>                             expr: _col1
>                             type: int
>                       outputColumnNames: _col0, _col1
>                       Group By Operator
>                         bucketGroup: false
>                         keys:
>                               expr: _col1
>                               type: int
>                               expr: _col0
>                               type: int
>                         mode: hash
>                         outputColumnNames: _col0, _col1
>                         Reduce Output Operator
>                           key expressions:
>                                 expr: _col0
>                                 type: int
>                                 expr: _col1
>                                 type: int
>                           sort order: ++
>                           Map-reduce partition columns:
>                                 expr: _col0
>                                 type: int
>                                 expr: _col1
>                                 type: int
>                           tag: -1
>         a-subquery2:b-subquery2:d 
>           TableScan
>             alias: d
>             Filter Operator
>               predicate:
>                   expr: (foo = 1)
>                   type: boolean
>               Select Operator
>                 expressions:
>                       expr: bar
>                       type: int
>                       expr: foo
>                       type: int
>                 outputColumnNames: _col0, _col1
>                 Union
>                   Select Operator
>                     expressions:
>                           expr: _col1
>                           type: int
>                           expr: _col0
>                           type: int
>                     outputColumnNames: _col0, _col1
>                     Select Operator
>                       expressions:
>                             expr: _col0
>                             type: int
>                             expr: _col1
>                             type: int
>                       outputColumnNames: _col0, _col1
>                       Group By Operator
>                         bucketGroup: false
>                         keys:
>                               expr: _col1
>                               type: int
>                               expr: _col0
>                               type: int
>                         mode: hash
>                         outputColumnNames: _col0, _col1
>                         Reduce Output Operator
>                           key expressions:
>                                 expr: _col0
>                                 type: int
>                                 expr: _col1
>                                 type: int
>                           sort order: ++
>                           Map-reduce partition columns:
>                                 expr: _col0
>                                 type: int
>                                 expr: _col1
>                                 type: int
>                           tag: -1
>       Reduce Operator Tree:
>         Group By Operator
>           bucketGroup: false
>           keys:
>                 expr: KEY._col0
>                 type: int
>                 expr: KEY._col1
>                 type: int
>           mode: mergepartial
>           outputColumnNames: _col0, _col1
>           Select Operator
>             expressions:
>                   expr: _col0
>                   type: int
>                   expr: _col1
>                   type: int
>             outputColumnNames: _col0, _col1
>             File Output Operator
>               compressed: false
>               GlobalTableId: 0
>               table:
>                   input format: org.apache.hadoop.mapred.TextInputFormat
>                   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> {noformat}
> Note that the filter is now "foo = 1", while the correct behavior is to have 
> "bar = 1".  If we remove the group by, the behavior is correct.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to