Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

Vineet Garg Mon, 05 Nov 2018 15:47:36 -0800


> On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
> > Lines 774 (patched)
> > <https://reviews.apache.org/r/69202/diff/1/?file=2103050#file2103050line774>
> >
> >     Cardinality and NDV may change. This just checks whether column 
> > origin/lineage can be backtrack to single input. We should use 
> > RelMdColumnOrigins or RelMdExpressionLineage instead of writing this new 
> > class.


The reason I decided not to use RelMdColumnOrigins or RelMdExpressionLineage is 
because both of them operate on single column/rex node. To get to column origin 
for a set of columns/RexRefs we will need to call them repeatadely. I feel this 
is big overkill and therefore wrote another class to take on set of columns.

I agree about the name being misleading. I'll rename the class to better 
communicate its purpose.


> On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
> > Lines 346 (patched)
> > <https://reviews.apache.org/r/69202/diff/1/?file=2103051#file2103051line346>
> >
> >     This should never be empty?

It could be empty if 'CardinalityChange' couldn't backtrack. However if can 
never be null.


> On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
> > Lines 359 (patched)
> > <https://reviews.apache.org/r/69202/diff/1/?file=2103051#file2103051line359>
> >
> >     This logic does not seem correct. What happens when a Project permutes 
> > input references and the GBy columns would not be in same order as the key? 
> > For instance consider GBy key is (0,1,2) which maps into columns (2,0,4) in 
> > TS. Key is (0,4). You would remove column 1 from GBy, but you should remove 
> > column 0. Using ColumnOrigins or ExpressionLineage providers to get the 
> > mapping would solve the issue.

The whole for loop actually takes care of this case. CardinalityChange returns 
you backtracked (if it could) columns (2,0,4) in this case. Then we loop over 
this backtracked columns and remove the corresponding column from the GBy key 
(based on the position) if not part of key. For instance in this case first 
column in backtracked columns (i.e. 2) is not part of key therefore 
corresponding column (i.e. 0) will be removed from GBy key.


- Vineet


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69202/#review210258
-----------------------------------------------------------


On Nov. 5, 2018, 10:57 p.m., Vineet Garg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69202/
> -----------------------------------------------------------
> 
> (Updated Nov. 5, 2018, 10:57 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-20804
>     https://issues.apache.org/jira/browse/HIVE-20804
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> See Jira
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 
> 9aa30129b6 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
>  b7c31bdfca 
>   ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 
>   ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
> 96caa4d6dd 
> 
> 
> Diff: https://reviews.apache.org/r/69202/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>

Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

Reply via email to