Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21239 )

Change subject: IMPALA-13043: Implement Join Capability to the Calcite Planner
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21239/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21239/16//COMMIT_MSG@28
PS16, Line 28: There is some special case logic in the join that requires the 
execution
             : engine to run on a single node. The original logic can be found 
in the
             : JoinNode planner object, but this code isn't executed. The new 
mechanism
             : for checking single node executions is passed back to the root 
via the
             : NodeWithExprs object, and is checked in the 
ImpalaJoinRel.useSingleNode
             : method.
I'm comparing this to how the legacy planner worked. Is there a specific SQL 
that needs the num_nodes=1 treatment?


http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java:

http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java@65
PS16, Line 65:     return 1;
Nit: Does it matter that this is 1 versus UNKNOWN_COST? (Not a real issue since 
we'll be doing more here later, just curious.)


http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java:

http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java@258
PS16, Line 258:       // Mapping it to a Left Semi Join in Impala seems to make
              :       // sense since it is unclear when we would need a
              :       // Right Semi Join
              :       return JoinOperator.LEFT_SEMI_JOIN;
Looking at the Calcite code, it looks like Calcite's SEMI join is by definition 
a LEFT SEMI (and it's ANTI is a LEFT ANTI).

Since this is implemented via a hash join, the same considerations apply as for 
any hash join. We would rather have a smaller build side to avoid memory usage 
for the hash table.

So, if we have a left semi join and the right input is very large, then we 
would rather invert the join and use a right semi join.


http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java:

http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@120
PS16, Line 120:     if (useSingleNode) {
              :       plannerContext.getQueryOptions().setNum_nodes(1);
              :     }
For the plans that set num_nodes=1 here, would any of them get fixed by the 
"invertJoins" call down in createPlans()? In other words, can there be cases 
where this would restrict itself to one node when the legacy planner would run 
it on more than one?

I guess this is something we will catch down the line when we run more test 
suites.



--
To view, visit http://gerrit.cloudera.org:8080/21239
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5db097577907d79877f52feff2922000af074ecd
Gerrit-Change-Number: 21239
Gerrit-PatchSet: 16
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Fri, 30 Aug 2024 00:44:11 +0000
Gerrit-HasComments: Yes

Reply via email to