Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/22091 )
Change subject: IMPALA-13531: Calcite CTE frontend ...................................................................... Patch Set 10: (23 comments) Right before I hit "post', I noticed I forgot to post some review comments from last year, so if some of them are stale, I do apologize. http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc@93 PS10, Line 93: DEFINE_string(cte_suggester_class, "org.apache.impala.calcite.cte.IdentitySuggester", Is there a reason we are making this a backend config option rather than a query option? http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java File fe/src/main/java/org/apache/impala/planner/CTEScanNode.java: http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java@59 PS7, Line 59: Preconditions.checkArgument(cteExprs_.size() == desc.getSlots().size()); Nit: CTEScanNode.class http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/planner/SequenceNode.java File fe/src/main/java/org/apache/impala/planner/SequenceNode.java: http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/planner/SequenceNode.java@53 PS10, Line 53: assignConjuncts(analyzer); If SequenceNode is at the top, will there be any conjuncts to assign? I wouldn't think this is necessary. http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/service/Frontend.java@169 PS10, Line 169: import org.apache.impala.planner.CTEScanNode; Nit: Unused http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java@21 PS7, Line 21: public class Exception extends ImpalaException { Gonna nit this one because I can see the argument that since it's already under a "cte" directory, calling this CTEException would be redundant. Having said that, if the "catch" part says catch(Exception e) That's gonna be a little confusing. Wouldn't stop the code review due to this, but if you argue this one back, I may give in easily. http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java: http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java@21 PS10, Line 21: public class Exception extends ImpalaException { Optional: I guess I'm ok with this, but it could be confusing to have the "Exception" class name overloaded. If I were to use this in a file that caught the exception but also wanted to capture a general Exception, there would be a naming conflict. I'll leave this up to you if you want to change it. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java@33 PS7, Line 33: public final class Registry { Same argument that I made with Exception, though I'm a little more forgiving here since it is unlikely (but not impossible) that there are 2 Registry classes. I think about this whenever I have to include "Table" in certain places and it is used at both a high level, and a metastore level which leads to really annoying full path package code if I have to include both Table classes. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java@39 PS7, Line 39: private final Map<String, RelNode> ctes = new HashMap<>(); Nit: ctes_ http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@35 PS7, Line 35: if (name == null || name.isEmpty()) { Optional: StringUtils.isEmpty() http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@36 PS7, Line 36: return (query, conf) -> Collections.emptyList(); My java brain is fuzzing out here. Does this compile? http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java: http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@27 PS10, Line 27: public static final String CTE_THRESHOLD = "impala.cte_threshold"; Is the threshold used? Is this meant for a safety valve or a future check? http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@86 PS7, Line 86: public void setProducer(ImpalaCTEProducer producer) { Does anyone call this? I would like to avoid making this a mutable class http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java: http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@36 PS10, Line 36: public class ImpalaCTEConsumer extends TableScan implements ImpalaPlanRel { > I based this on a TableScan because I figured RelOptMaterialization in http SJC Placeholder: Check if RelOptMaterialization.qualifiedTableName is used...how is digest being used as well. http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@74 PS10, Line 74: producer_.getRowType(), producer_.getChild(), getName(), producer_.nextIndex()); > This mutates the producer, but couldn't figure out a better way to provide So the general mechanism I've tried to use to avoid mutation is to use ParentPlanRelContext to pass information down the RelNode tree and NodeWithExprs (which prolly needs a better name) to pass things up through the tree. We would probably need both of these classes and keep a Map<Producer, Integer> which gets copied. This is the way I would code it at least. We can discuss if you think this would get too cumbersome, but I always prefer the non-mutation route. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java@33 PS7, Line 33: private final String name_; We really should avoid making this a mutable class. Is this necessary? http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java@64 PS7, Line 64: @Override Same as above: Does anyone call "getChild()"? I might have missed it. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@40 PS7, Line 40: I made some comments below, but this might be dangerous to have. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@71 PS7, Line 71: public double estimateRowCount(RelMetadataQuery mq) { Is this because we are deriving from AbstractRelNode which requires this? I don't think this would be called. I think we should throw an exception here unless you've seen this called. I'd rather not rely on the inputs here since other classes haven't implemented this. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@81 PS7, Line 81: public List<RelNode> getInputs() { I'm surprised we need this. Could we derive off of something lower than AbstractRelNode which implements this method (if it's not in AbstractRelNode)? http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@87 PS7, Line 87: inputs_.set(ordinalInParent, p); Same as above comment. Keeping track of "inputs_" actually gets weird during HepPlanner time since they add those HepRelVertex intermediary nodes http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java: http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@203 PS10, Line 203: consumer.setProducer(producer); So I don't have this quite mapped out in my head right now, but as I mentioned elsewhere, I personally try to avoid any mutation in classes... Would it be possible to create this information and store in the ImpalaSequence? That seems to be always at the top? Then maybe we can pass some information through the ParentRelNodeContext to the appropriate places? http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@359 PS10, Line 359: // CTEs currently only supported in SingleNodePlanner. You mean an executor with a single node here, not the SingleNodePlanner class which creates a single node plan but then expands to a plan with more nodes later. http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@438 PS10, Line 438: optCluster.invalidateMetadataQuery(); What is the particular reason we need to invalidate? I'm not sure what this does within Calcite...my guess is that it clears the cache? Would this be necessary? I guess I'm not sure. -- To view, visit http://gerrit.cloudera.org:8080/22091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0840c0859d2fe25628d799a18d302cee1eb36e8 Gerrit-Change-Number: 22091 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Anonymous Coward (816) Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Fri, 28 Nov 2025 02:53:18 +0000 Gerrit-HasComments: Yes
