> On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java > > Lines 774 (patched) > > <https://reviews.apache.org/r/69202/diff/1/?file=2103050#file2103050line774> > > > > Cardinality and NDV may change. This just checks whether column > > origin/lineage can be backtrack to single input. We should use > > RelMdColumnOrigins or RelMdExpressionLineage instead of writing this new > > class.
The reason I decided not to use RelMdColumnOrigins or RelMdExpressionLineage is because both of them operate on single column/rex node. To get to column origin for a set of columns/RexRefs we will need to call them repeatadely. I feel this is big overkill and therefore wrote another class to take on set of columns. I agree about the name being misleading. I'll rename the class to better communicate its purpose. > On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java > > Lines 346 (patched) > > <https://reviews.apache.org/r/69202/diff/1/?file=2103051#file2103051line346> > > > > This should never be empty? It could be empty if 'CardinalityChange' couldn't backtrack. However if can never be null. > On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java > > Lines 359 (patched) > > <https://reviews.apache.org/r/69202/diff/1/?file=2103051#file2103051line359> > > > > This logic does not seem correct. What happens when a Project permutes > > input references and the GBy columns would not be in same order as the key? > > For instance consider GBy key is (0,1,2) which maps into columns (2,0,4) in > > TS. Key is (0,4). You would remove column 1 from GBy, but you should remove > > column 0. Using ColumnOrigins or ExpressionLineage providers to get the > > mapping would solve the issue. The whole for loop actually takes care of this case. CardinalityChange returns you backtracked (if it could) columns (2,0,4) in this case. Then we loop over this backtracked columns and remove the corresponding column from the GBy key (based on the position) if not part of key. For instance in this case first column in backtracked columns (i.e. 2) is not part of key therefore corresponding column (i.e. 0) will be removed from GBy key. - Vineet ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69202/#review210258 ----------------------------------------------------------- On Nov. 5, 2018, 10:57 p.m., Vineet Garg wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69202/ > ----------------------------------------------------------- > > (Updated Nov. 5, 2018, 10:57 p.m.) > > > Review request for hive and Jesús Camacho Rodríguez. > > > Bugs: HIVE-20804 > https://issues.apache.org/jira/browse/HIVE-20804 > > > Repository: hive-git > > > Description > ------- > > See Jira > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java > 9aa30129b6 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java > b7c31bdfca > ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 > ql/src/test/results/clientpositive/llap/constraints_optimization.q.out > 96caa4d6dd > > > Diff: https://reviews.apache.org/r/69202/diff/2/ > > > Testing > ------- > > > Thanks, > > Vineet Garg > >