Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21239 )

Change subject: IMPALA-13043: Implement Join Capability to the Calcite Planner
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21239/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21239/16//COMMIT_MSG@28
PS16, Line 28: There is some special case logic in the join that requires the 
execution
             : engine to run on a single node. The original logic can be found 
in the
             : JoinNode planner object, but this code isn't executed. The new 
mechanism
             : for checking single node executions is passed back to the root 
via the
             : NodeWithExprs object, and is checked in the 
ImpalaJoinRel.useSingleNode
             : method.
> I'm comparing this to how the legacy planner worked. Is there a specific SQ
Ok...I had to reexamine this. I didn't code this part the first time and put 
this in based on some comments I saw in a draft that was created outside of 
this commit...

This logic is based on the logic in SingleNodePlanner.validatePlan(), which is 
the only case the num nodes is used right now.  I haven't checked e2e tests to 
see where this is tested.

But on second glance, we throw an exception in this case.  Now I'm wondering if 
I should revisit this logic and instead of setting numNodes to 1, if we should 
set an exception?  It would simplify the code. I'm not sure it would break 
something else, but maybe I should make this change.

I do want your thoughts on this because perhaps I'm missing some nuance in the 
existing logic.  But if there is, we'll probably catch it later when 
implementing all e2e tests.


http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java:

http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java@65
PS16, Line 65:     return 1;
> Nit: Does it matter that this is 1 versus UNKNOWN_COST? (Not a real issue s
Yeah, IIRC (and I may be misremembering) I think it crashed in 
PlanNode.orderConjunctsByCost() because it needed an actual cost defined.


http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java:

http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java@258
PS16, Line 258:       // Mapping it to a Left Semi Join in Impala seems to make
              :       // sense since it is unclear when we would need a
              :       // Right Semi Join
              :       return JoinOperator.LEFT_SEMI_JOIN;
> Looking at the Calcite code, it looks like Calcite's SEMI join is by defini
Yeah, we're probably going to have to make some Calcite changes to handle this. 
 From what I've seen of Calcite, it doesn't handle RIGHT_SEMI_JOIN.

I can put a TODO here and file a Jira if you think that's what is best.


http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java:

http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@120
PS16, Line 120:     if (useSingleNode) {
              :       plannerContext.getQueryOptions().setNum_nodes(1);
              :     }
> For the plans that set num_nodes=1 here, would any of them get fixed by the
My hope is that by the end of this project, there is no need to call 
"invertJoins", though this may be impossible. I haven't looked at invertJoins 
too closely.  Are there physical node reasons why we would invert the join?  
Because if it's all based on stats and even if there are query options, it 
really should be handled in the logical optimization of the plan. And if so, 
then by the time it hits this portion, the plan should be set.

But see my comment above in the "commit message".  Perhaps I should be throwing 
an exception here.

And yes, I think later join tests will help test this.



--
To view, visit http://gerrit.cloudera.org:8080/21239
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5db097577907d79877f52feff2922000af074ecd
Gerrit-Change-Number: 21239
Gerrit-PatchSet: 16
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Fri, 30 Aug 2024 15:15:39 +0000
Gerrit-HasComments: Yes

Reply via email to