Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23317 )

Change subject: IMPALA-14115: Calcite planner: Added top-n analytic PlanNode 
optimization.
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23317/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23317/2//COMMIT_MSG@26
PS2, Line 26: This has been tested against the test_limit_pushdown_analytic.py 
file
> The e2e tests is good but it does not verify if the top-n was created as ex
Added the 2 junit tests.

There are various differences in the files and I committed them in a way that 
will hopefully be easier to review.

Version 2 is the limit-pushdown-analytic.test run with the original planner, 
just a copy.

Version 3 is the limit-pushdown-analytic.test run with the Calcite planner for 
easy comparison.

Version 5 is the initial analytics-rank-pushdown.test file just copied from the 
original planner

Version 6 is the original planner run with analytic_rank_pushdown_threshold=0

Version 7 is the final run with threshold set to 0.  So hopefully you can 
compare runs a little easier.


http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java:

http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@153
PS2, Line 153: (e.g. ranking > 5)
> The optimization is applicable for < or <= type filters, not for >  because
Heh, in my mind, I was thinking that 1 is a greater ranking than 5, but from a 
programming perspective, you are correct.  Made the change.


http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@161
PS2, Line 161: not a rank
> nit: can we mention both RANK and ROW_NUMBER ?
Done


http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@172
PS2, Line 172:         
simplifiedAnalyzer.setUnassignedConjuncts(converter.getImpalaConjuncts());
> This will overwrite the unassigned conjuncts .. is this the desired behavio
Unlike the original Impala planner, unassignedConjuncts is not used across 
Planner nodes.  So the unassignedConjuncts should always be empty here.  It is 
also cleared out a few lines below in 
simplifiedAnalyzer.clearUnassignedConjuncts().

You can also see within our Calcite SimplifiedAnalyzer object that 
unassignedConjuncts is kept and managed separately from anything in the 
Analyzer.  There are various *UnassignedConjuncts() methods that are only in 
SimplifiedAnalyzer and not in Analyzer.  This has precedence as can be seen in 
ImpalaAggRel.

I should have had more comments since this isn't obvious in the implementation. 
I hope this is beneficial for the next time someone has to look at this code.  
I also added a preconditions check to assure there are no unassigned conjuncts 
coming in.


http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@443
PS2, Line 443: AnalyticExpr must be unique within the AnalyticExpr
> This comment seems either incomplete or incorrect.  Did you mean unique wit
Nah, I meant that while they can be duplicated with the list of projects, the 
AnalyticPlanner needs them to be unique.  Changed AnalyticExpr to 
AnalyticPlanner here.


http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@453
PS2, Line 453:         if (!uniqueAnalyticExprs.contains(analyticExpr)) {
> If we have 2 OVER exprs which are identical but have different aliases, wou
This is the exact type scenario for which the uniqueAnalyticExprs is checking. 
The AnalyticPlanner cannot handle adding two separate expressions that are 
exactly the same, even when the aliases are different.

I verified this anyway, but for your query, only one uniqueAnalyticExpr is 
added and it produces the correct results with the correct labels.


http://gerrit.cloudera.org:8080/#/c/23317/2/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaAnalyticRel.java@493
PS2, Line 493: SqlKind.RANK
> nit: or a SqlKind.ROW_NUMBER
Done



--
To view, visit http://gerrit.cloudera.org:8080/23317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6fa6781db56771b13b0cf49bd236f776016bf8d
Gerrit-Change-Number: 23317
Gerrit-PatchSet: 2
Gerrit-Owner: Steve Carlin <scar...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Sep 2025 23:46:29 +0000
Gerrit-HasComments: Yes

Reply via email to