Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21239 )
Change subject: IMPALA-13043: Implement Join Capability to the Calcite Planner ...................................................................... Patch Set 16: (4 comments) http://gerrit.cloudera.org:8080/#/c/21239/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21239/16//COMMIT_MSG@28 PS16, Line 28: There is some special case logic in the join that requires the execution : engine to run on a single node. The original logic can be found in the : JoinNode planner object, but this code isn't executed. The new mechanism : for checking single node executions is passed back to the root via the : NodeWithExprs object, and is checked in the ImpalaJoinRel.useSingleNode : method. I'm comparing this to how the legacy planner worked. Is there a specific SQL that needs the num_nodes=1 treatment? http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java: http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/AnalyzedBinaryCompExpr.java@65 PS16, Line 65: return 1; Nit: Does it matter that this is 1 versus UNKNOWN_COST? (Not a real issue since we'll be doing more here later, just curious.) http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java: http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaJoinRel.java@258 PS16, Line 258: // Mapping it to a Left Semi Join in Impala seems to make : // sense since it is unclear when we would need a : // Right Semi Join : return JoinOperator.LEFT_SEMI_JOIN; Looking at the Calcite code, it looks like Calcite's SEMI join is by definition a LEFT SEMI (and it's ANTI is a LEFT ANTI). Since this is implemented via a hash join, the same considerations apply as for any hash join. We would rather have a smaller build side to avoid memory usage for the hash table. So, if we have a left semi join and the right input is very large, then we would rather invert the join and use a right semi join. http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java: http://gerrit.cloudera.org:8080/#/c/21239/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@120 PS16, Line 120: if (useSingleNode) { : plannerContext.getQueryOptions().setNum_nodes(1); : } For the plans that set num_nodes=1 here, would any of them get fixed by the "invertJoins" call down in createPlans()? In other words, can there be cases where this would restrict itself to one node when the legacy planner would run it on more than one? I guess this is something we will catch down the line when we run more test suites. -- To view, visit http://gerrit.cloudera.org:8080/21239 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5db097577907d79877f52feff2922000af074ecd Gerrit-Change-Number: 21239 Gerrit-PatchSet: 16 Gerrit-Owner: Steve Carlin <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Fri, 30 Aug 2024 00:44:11 +0000 Gerrit-HasComments: Yes
