[ https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-2382: --------------------------------- Status: Open (was: Patch Available) @Charles: Please make sure the patch applies cleanly with 'patch -p0'. Thanks! > Invalid predicate pushdown from incorrect column expression map for select > operator generated by GROUP BY operation > ------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-2382 > URL: https://issues.apache.org/jira/browse/HIVE-2382 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.6.0 > Reporter: Charles Chen > Assignee: Charles Chen > Priority: Critical > Fix For: 0.8.0 > > Attachments: HIVE-2382v1.patch > > > When a GROUP BY is specified, a select operator is added before the GROUP BY > in SemanticAnalyzer.insertSelectAllPlanForGroupBy. Currently, the column > expression map for this is set to the column expression map for the parent > operator. This behavior is incorrect as, for example, the parent operator > could rearrange the order of the columns (_col0 => _col0, _col1 => _col2, > _col2 => _col1) and the new operator should not repeat this. > The predicate pushdown optimization uses the column expression map to track > which columns a filter expression refers to at different operators. This > results in a filter on incorrect columns. > Here is a simple case of this going wrong: Using > {noformat} > create table invites (id int, foo int, bar int); > {noformat} > executing the query > {noformat} > explain select * from (select foo, bar from (select bar, foo from invites c > union all select bar, foo from invites d) b) a group by bar, foo having bar=1; > {noformat} > results in > {noformat} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > a-subquery1:b-subquery1:c > TableScan > alias: c > Filter Operator > predicate: > expr: (foo = 1) > type: boolean > Select Operator > expressions: > expr: bar > type: int > expr: foo > type: int > outputColumnNames: _col0, _col1 > Union > Select Operator > expressions: > expr: _col1 > type: int > expr: _col0 > type: int > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > Group By Operator > bucketGroup: false > keys: > expr: _col1 > type: int > expr: _col0 > type: int > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: int > expr: _col1 > type: int > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: int > expr: _col1 > type: int > tag: -1 > a-subquery2:b-subquery2:d > TableScan > alias: d > Filter Operator > predicate: > expr: (foo = 1) > type: boolean > Select Operator > expressions: > expr: bar > type: int > expr: foo > type: int > outputColumnNames: _col0, _col1 > Union > Select Operator > expressions: > expr: _col1 > type: int > expr: _col0 > type: int > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > Group By Operator > bucketGroup: false > keys: > expr: _col1 > type: int > expr: _col0 > type: int > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: int > expr: _col1 > type: int > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: int > expr: _col1 > type: int > tag: -1 > Reduce Operator Tree: > Group By Operator > bucketGroup: false > keys: > expr: KEY._col0 > type: int > expr: KEY._col1 > type: int > mode: mergepartial > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {noformat} > Note that the filter is now "foo = 1", while the correct behavior is to have > "bar = 1". If we remove the group by, the behavior is correct. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira