[jira] [Commented] (HIVE-2383) Incorrect alias filtering for predicate pushdown

[email protected] (JIRA) Tue, 30 Aug 2011 22:49:51 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094320#comment-13094320
 ]


[email protected] commented on HIVE-2383:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1568/#review1700
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
<https://reviews.apache.org/r/1568/#comment3855>

    The predicates are already cloned in 
ExprWalkerProcFactory.extractPushdownPreds() and this behavior is inconsistent 
with addPushdowns() which doesn't clone the predicates.  Moreover, not cloning 
the expressions is convenient for the newToOldExprMap so that now the 
ExprNodeDesc's can be compared with == instead of a deep comparison of the tree.



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
<https://reviews.apache.org/r/1568/#comment3856>

    From what I understand, currently, when the PPD encounters a JoinOperator, 
mergeChildrenPred is used to find the filters that are not being pushed down 
past that operator based on the alias and these should be the same filters that 
are stored in ExprWalkerInfo.nonFinalPreds (except the filters we want filter 
on the output columns of the JoinOperator while the nonFinalPreds filter on its 
input).  The bug in HIVE-2383 is that alias filtering should be done in the 
context of the JoinOperator, not on its children, so now mergeChildrenPred 
doesn't return the correct filters to create.  I changed the JoinPPD to use the 
nonFinalPreds to figure out which filters to create and added a map in 
ExprWalkerInfo.newToOldExprMap to store the mapping from each filter in the 
operator's context to the filter that it came from in the children's context 
(this is populated during ExprWalkerProcFactory.extractPushdownPreds()).


- Charles


On 2011-08-31 05:01:19, Charles Chen wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1568/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-31 05:01:19)
bq.  
bq.  
bq.  Review request for hive.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  https://issues.apache.org/jira/browse/HIVE-2383
bq.  
bq.  
bq.  This addresses bug HIVE-2383.
bq.      https://issues.apache.org/jira/browse/HIVE-2383
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
 1163438 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
 1163438 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
 1163438 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_repeated_alias.q
 PRE-CREATION 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/auto_join8.q.out
 1163438 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out
 1163438 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join8.q.out
 1163438 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_repeated_alias.q.out
 PRE-CREATION 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/udf_bitmap_empty.q.out
 1163438 
bq.    
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join8.q.xml
 1163438 
bq.  
bq.  Diff: https://reviews.apache.org/r/1568/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Charles
bq.  
bq.



> Incorrect alias filtering for predicate pushdown
> ------------------------------------------------
>
>                 Key: HIVE-2383
>                 URL: https://issues.apache.org/jira/browse/HIVE-2383
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Charles Chen
>            Assignee: Charles Chen
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2383v1.patch, HIVE-2383v2.patch, HIVE-2383v5.patch
>
>
> The predicate pushdown optimizer starts at the topmost operators traverses 
> the operator tree, at each stage collecting predicates to be pushed down.  At 
> each operator, ive.ql.ppd.OpProcFactory.DefaultPPD.mergeWithChildrenPred is 
> called, which merges the predicates of the children nodes into the current 
> node.  The predicates are stored in hive.ql.ppd.ExprWalkerInfo.pushdownPreds 
> as a map from the alias a predicate refers to (a predicate may only refer to 
> one alias at a time as only such predicates can be pushed) to a list of such 
> predicates.  Since at each stage the alias the predicate refers to may change 
> (subqueries may change aliases), this is updated for each operator 
> (hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds is called which walks 
> the ExprNodeDesc for each predicate). When a JoinOperator is encountered, 
> mergeWithChildrenPred is passed an optional parameter "aliases" which 
> contains a set of aliases that can be pushed per ansi semantics (see 
> hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases).  The part that is 
> incorrect is that aliases are filtered in mergeWithChildrenPred before 
> extractPushdownPreds is called, which associates the predicates with the 
> correct alias in the current operator's context while the filtering should 
> happen after.
> In test case Q2 below, when the predicate "a.bar=3" comes into the 
> JoinOperator, the alias is "a" coming in so it is accepted for pushdown.  
> When brought into the JoinOperator's context, however, since the predicate 
> refers to b.foo in the inner scope, we should not actually accept this for 
> pushdown.
> With the test cases
> {noformat}
> -- Q1: predicate should not be pushed on the right side of a left outer join 
> (this is correct in trunk)
> explain
> SELECT a.foo as foo1, b.foo as foo2, b.bar
> FROM pokes a LEFT OUTER JOIN pokes2 b
> ON a.foo=b.foo
> WHERE b.bar=3;
> -- Q2: predicate should not be pushed on the right side of a left outer join 
> (this is broken in trunk)
> explain
> SELECT * FROM
>     (SELECT a.foo as foo1, b.foo as foo2, b.bar
>     FROM pokes a LEFT OUTER JOIN pokes2 b
>     ON a.foo=b.foo) a
> WHERE a.bar=3;
> -- Q3: predicate should be pushed (this is correct in trunk)
> explain
> SELECT * FROM
>     (SELECT a.foo as foo1, b.foo as foo2, a.bar
>     FROM pokes a JOIN pokes2 b
>     ON a.foo=b.foo) a
> WHERE a.bar=3;
> {noformat}
> The current output is
> {noformat}
> hive> 
>     > -- Q1: predicate should not be pushed on the right side of a left outer 
> join
>     > explain
>     > SELECT a.foo as foo1, b.foo as foo2, b.bar
>     > FROM pokes a LEFT OUTER JOIN pokes2 b
>     > ON a.foo=b.foo
>     > WHERE b.bar=3;
> OK
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME pokes) a) 
> (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. (TOK_TABLE_OR_COL a) foo) (. 
> (TOK_TABLE_OR_COL b) foo)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
> TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) foo) foo1) 
> (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) foo2) (TOK_SELEXPR (. 
> (TOK_TABLE_OR_COL b) bar))) (TOK_WHERE (= (. (TOK_TABLE_OR_COL b) bar) 3))))
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a 
>           TableScan
>             alias: a
>             Reduce Output Operator
>               key expressions:
>                     expr: foo
>                     type: int
>               sort order: +
>               Map-reduce partition columns:
>                     expr: foo
>                     type: int
>               tag: 0
>               value expressions:
>                     expr: foo
>                     type: int
>         b 
>           TableScan
>             alias: b
>             Reduce Output Operator
>               key expressions:
>                     expr: foo
>                     type: int
>               sort order: +
>               Map-reduce partition columns:
>                     expr: foo
>                     type: int
>               tag: 1
>               value expressions:
>                     expr: foo
>                     type: int
>                     expr: bar
>                     type: int
>       Reduce Operator Tree:
>         Join Operator
>           condition map:
>                Left Outer Join0 to 1
>           condition expressions:
>             0 {VALUE._col0}
>             1 {VALUE._col0} {VALUE._col1}
>           handleSkewJoin: false
>           outputColumnNames: _col0, _col4, _col5
>           Filter Operator
>             predicate:
>                 expr: (_col5 = 3)
>                 type: boolean
>             Select Operator
>               expressions:
>                     expr: _col0
>                     type: int
>                     expr: _col4
>                     type: int
>                     expr: _col5
>                     type: int
>               outputColumnNames: _col0, _col1, _col2
>               File Output Operator
>                 compressed: false
>                 GlobalTableId: 0
>                 table:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> Time taken: 0.113 seconds
> hive> 
>     > -- Q2: predicate should not be pushed on the right side of a left outer 
> join
>     > explain
>     > SELECT * FROM
>     >     (SELECT a.foo as foo1, b.foo as foo2, b.bar
>     >     FROM pokes a LEFT OUTER JOIN pokes2 b
>     >     ON a.foo=b.foo) a
>     > WHERE a.bar=3;
> OK
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN 
> (TOK_TABREF (TOK_TABNAME pokes) a) (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. 
> (TOK_TABLE_OR_COL a) foo) (. (TOK_TABLE_OR_COL b) foo)))) (TOK_INSERT 
> (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. 
> (TOK_TABLE_OR_COL a) foo) foo1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) 
> foo2) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) bar))))) a)) (TOK_INSERT 
> (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
> TOK_ALLCOLREF)) (TOK_WHERE (= (. (TOK_TABLE_OR_COL a) bar) 3))))
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a:a 
>           TableScan
>             alias: a
>             Reduce Output Operator
>               key expressions:
>                     expr: foo
>                     type: int
>               sort order: +
>               Map-reduce partition columns:
>                     expr: foo
>                     type: int
>               tag: 0
>               value expressions:
>                     expr: foo
>                     type: int
>         a:b 
>           TableScan
>             alias: b
>             Filter Operator
>               predicate:
>                   expr: (bar = 3)
>                   type: boolean
>               Reduce Output Operator
>                 key expressions:
>                       expr: foo
>                       type: int
>                 sort order: +
>                 Map-reduce partition columns:
>                       expr: foo
>                       type: int
>                 tag: 1
>                 value expressions:
>                       expr: foo
>                       type: int
>                       expr: bar
>                       type: int
>       Reduce Operator Tree:
>         Join Operator
>           condition map:
>                Left Outer Join0 to 1
>           condition expressions:
>             0 {VALUE._col0}
>             1 {VALUE._col0} {VALUE._col1}
>           handleSkewJoin: false
>           outputColumnNames: _col0, _col4, _col5
>           Select Operator
>             expressions:
>                   expr: _col0
>                   type: int
>                   expr: _col4
>                   type: int
>                   expr: _col5
>                   type: int
>             outputColumnNames: _col0, _col1, _col2
>             Select Operator
>               expressions:
>                     expr: _col0
>                     type: int
>                     expr: _col1
>                     type: int
>                     expr: _col2
>                     type: int
>               outputColumnNames: _col0, _col1, _col2
>               File Output Operator
>                 compressed: false
>                 GlobalTableId: 0
>                 table:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> Time taken: 0.101 seconds
> hive> 
>     > -- Q3: predicate should be pushed
>     > explain
>     > SELECT * FROM
>     >     (SELECT a.foo as foo1, b.foo as foo2, a.bar
>     >     FROM pokes a JOIN pokes2 b
>     >     ON a.foo=b.foo) a
>     > WHERE a.bar=3;
> OK
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN 
> (TOK_TABREF (TOK_TABNAME pokes) a) (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. 
> (TOK_TABLE_OR_COL a) foo) (. (TOK_TABLE_OR_COL b) foo)))) (TOK_INSERT 
> (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. 
> (TOK_TABLE_OR_COL a) foo) foo1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) 
> foo2) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) bar))))) a)) (TOK_INSERT 
> (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
> TOK_ALLCOLREF)) (TOK_WHERE (= (. (TOK_TABLE_OR_COL a) bar) 3))))
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a:a 
>           TableScan
>             alias: a
>             Filter Operator
>               predicate:
>                   expr: (bar = 3)
>                   type: boolean
>               Reduce Output Operator
>                 key expressions:
>                       expr: foo
>                       type: int
>                 sort order: +
>                 Map-reduce partition columns:
>                       expr: foo
>                       type: int
>                 tag: 0
>                 value expressions:
>                       expr: foo
>                       type: int
>                       expr: bar
>                       type: int
>         a:b 
>           TableScan
>             alias: b
>             Reduce Output Operator
>               key expressions:
>                     expr: foo
>                     type: int
>               sort order: +
>               Map-reduce partition columns:
>                     expr: foo
>                     type: int
>               tag: 1
>               value expressions:
>                     expr: foo
>                     type: int
>       Reduce Operator Tree:
>         Join Operator
>           condition map:
>                Inner Join 0 to 1
>           condition expressions:
>             0 {VALUE._col0} {VALUE._col1}
>             1 {VALUE._col0}
>           handleSkewJoin: false
>           outputColumnNames: _col0, _col1, _col4
>           Select Operator
>             expressions:
>                   expr: _col0
>                   type: int
>                   expr: _col4
>                   type: int
>                   expr: _col1
>                   type: int
>             outputColumnNames: _col0, _col1, _col2
>             Select Operator
>               expressions:
>                     expr: _col0
>                     type: int
>                     expr: _col1
>                     type: int
>                     expr: _col2
>                     type: int
>               outputColumnNames: _col0, _col1, _col2
>               File Output Operator
>                 compressed: false
>                 GlobalTableId: 0
>                 table:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> {noformat}
> Note that Q2 is incorrect because the predicate "bar = 3" is incorrectly 
> pushed to the right side of the left outer join (Q1 and Q3 are correct).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2383) Incorrect alias filtering for predicate pushdown

Reply via email to