Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23317 )
Change subject: IMPALA-14115: Calcite planner: Added top-n analytic PlanNode optimization. ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/23317/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23317/2//COMMIT_MSG@26 PS2, Line 26: This has been tested against the test_limit_pushdown_analytic.py file > The e2e tests is good but it does not verify if the top-n was created as ex Added the 2 junit tests. There are various differences in the files and I committed them in a way that will hopefully be easier to review. Version 2 is the limit-pushdown-analytic.test run with the original planner, just a copy. Version 3 is the limit-pushdown-analytic.test run with the Calcite planner for easy comparison. Version 5 is the initial analytics-rank-pushdown.test file just copied from the original planner Version 6 is the original planner run with analytic_rank_pushdown_threshold=0 Version 7 is the final run with threshold set to 0. So hopefully you can compare runs a little easier. http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java: http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@153 PS2, Line 153: (e.g. ranking > 5) > The optimization is applicable for < or <= type filters, not for > because Heh, in my mind, I was thinking that 1 is a greater ranking than 5, but from a programming perspective, you are correct. Made the change. http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@161 PS2, Line 161: not a rank > nit: can we mention both RANK and ROW_NUMBER ? Done http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@172 PS2, Line 172: simplifiedAnalyzer.setUnassignedConjuncts(converter.getImpalaConjuncts()); > This will overwrite the unassigned conjuncts .. is this the desired behavio Unlike the original Impala planner, unassignedConjuncts is not used across Planner nodes. So the unassignedConjuncts should always be empty here. It is also cleared out a few lines below in simplifiedAnalyzer.clearUnassignedConjuncts(). You can also see within our Calcite SimplifiedAnalyzer object that unassignedConjuncts is kept and managed separately from anything in the Analyzer. There are various *UnassignedConjuncts() methods that are only in SimplifiedAnalyzer and not in Analyzer. This has precedence as can be seen in ImpalaAggRel. I should have had more comments since this isn't obvious in the implementation. I hope this is beneficial for the next time someone has to look at this code. I also added a preconditions check to assure there are no unassigned conjuncts coming in. http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@443 PS2, Line 443: AnalyticExpr must be unique within the AnalyticExpr > This comment seems either incomplete or incorrect. Did you mean unique wit Nah, I meant that while they can be duplicated with the list of projects, the AnalyticPlanner needs them to be unique. Changed AnalyticExpr to AnalyticPlanner here. http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@453 PS2, Line 453: if (!uniqueAnalyticExprs.contains(analyticExpr)) { > If we have 2 OVER exprs which are identical but have different aliases, wou This is the exact type scenario for which the uniqueAnalyticExprs is checking. The AnalyticPlanner cannot handle adding two separate expressions that are exactly the same, even when the aliases are different. I verified this anyway, but for your query, only one uniqueAnalyticExpr is added and it produces the correct results with the correct labels. http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@493 PS2, Line 493: SqlKind.RANK > nit: or a SqlKind.ROW_NUMBER Done -- To view, visit http://gerrit.cloudera.org:8080/23317 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie6fa6781db56771b13b0cf49bd236f776016bf8d Gerrit-Change-Number: 23317 Gerrit-PatchSet: 2 Gerrit-Owner: Steve Carlin <scar...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com> Gerrit-Comment-Date: Tue, 30 Sep 2025 23:46:29 +0000 Gerrit-HasComments: Yes