[ https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087264#comment-13087264 ]
jirapos...@reviews.apache.org commented on HIVE-2382: ----------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1566/ ----------------------------------------------------------- (Updated 2011-08-18 20:29:54.207958) Review request for hive. Changes ------- Unit tests passed Summary ------- https://issues.apache.org/jira/browse/HIVE-2382 This addresses bug HIVE-2382. https://issues.apache.org/jira/browse/HIVE-2382 Diffs ----- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1157990 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/groupby_ppd.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/groupby_ppd.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1566/diff Testing (updated) ------- Unit tests passed Thanks, Charles > Invalid predicate pushdown from incorrect column expression map for select > operator generated by GROUP BY operation > ------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-2382 > URL: https://issues.apache.org/jira/browse/HIVE-2382 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.6.0 > Reporter: Charles Chen > Assignee: Charles Chen > Priority: Critical > Fix For: 0.8.0 > > Attachments: HIVE-2382v1.patch > > > When a GROUP BY is specified, a select operator is added before the GROUP BY > in SemanticAnalyzer.insertSelectAllPlanForGroupBy. Currently, the column > expression map for this is set to the column expression map for the parent > operator. This behavior is incorrect as, for example, the parent operator > could rearrange the order of the columns (_col0 => _col0, _col1 => _col2, > _col2 => _col1) and the new operator should not repeat this. > The predicate pushdown optimization uses the column expression map to track > which columns a filter expression refers to at different operators. This > results in a filter on incorrect columns. > Here is a simple case of this going wrong: Using > {noformat} > create table invites (id int, foo int, bar int); > {noformat} > executing the query > {noformat} > explain select * from (select foo, bar from (select bar, foo from invites c > union all select bar, foo from invites d) b) a group by bar, foo having bar=1; > {noformat} > results in > {noformat} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > a-subquery1:b-subquery1:c > TableScan > alias: c > Filter Operator > predicate: > expr: (foo = 1) > type: boolean > Select Operator > expressions: > expr: bar > type: int > expr: foo > type: int > outputColumnNames: _col0, _col1 > Union > Select Operator > expressions: > expr: _col1 > type: int > expr: _col0 > type: int > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > Group By Operator > bucketGroup: false > keys: > expr: _col1 > type: int > expr: _col0 > type: int > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: int > expr: _col1 > type: int > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: int > expr: _col1 > type: int > tag: -1 > a-subquery2:b-subquery2:d > TableScan > alias: d > Filter Operator > predicate: > expr: (foo = 1) > type: boolean > Select Operator > expressions: > expr: bar > type: int > expr: foo > type: int > outputColumnNames: _col0, _col1 > Union > Select Operator > expressions: > expr: _col1 > type: int > expr: _col0 > type: int > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > Group By Operator > bucketGroup: false > keys: > expr: _col1 > type: int > expr: _col0 > type: int > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: int > expr: _col1 > type: int > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: int > expr: _col1 > type: int > tag: -1 > Reduce Operator Tree: > Group By Operator > bucketGroup: false > keys: > expr: KEY._col0 > type: int > expr: KEY._col1 > type: int > mode: mergepartial > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {noformat} > Note that the filter is now "foo = 1", while the correct behavior is to have > "bar = 1". If we remove the group by, the behavior is correct. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira