Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22091 )

Change subject: IMPALA-13531: Calcite CTE frontend
......................................................................


Patch Set 10:

(23 comments)

Right before I hit "post', I noticed I forgot to post some review comments from 
last year, so if some of them are stale, I do apologize.

http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc@93
PS10, Line 93: DEFINE_string(cte_suggester_class, 
"org.apache.impala.calcite.cte.IdentitySuggester",
Is there a reason we are making this a backend config option rather than a 
query option?


http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java
File fe/src/main/java/org/apache/impala/planner/CTEScanNode.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java@59
PS7, Line 59:     Preconditions.checkArgument(cteExprs_.size() == 
desc.getSlots().size());
Nit: CTEScanNode.class


http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/planner/SequenceNode.java
File fe/src/main/java/org/apache/impala/planner/SequenceNode.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/planner/SequenceNode.java@53
PS10, Line 53:     assignConjuncts(analyzer);
If SequenceNode is at the top, will there be any conjuncts to assign?  I 
wouldn't think this is necessary.


http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/fe/src/main/java/org/apache/impala/service/Frontend.java@169
PS10, Line 169: import org.apache.impala.planner.CTEScanNode;
Nit: Unused


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java@21
PS7, Line 21: public class Exception extends ImpalaException {
Gonna nit this one because I can see the argument that since it's already under 
a "cte" directory, calling this CTEException would be redundant.

Having said that, if the "catch" part says
catch(Exception e)

That's gonna be a little confusing.

Wouldn't stop the code review due to this, but if you argue this one back, I 
may give in easily.


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Exception.java@21
PS10, Line 21: public class Exception extends ImpalaException {
Optional: I guess I'm ok with this, but it could be confusing to have the 
"Exception" class name overloaded.  If I were to use this in a file that caught 
the exception but also wanted to capture a general Exception, there would be a 
naming conflict.

I'll leave this up to you if you want to change it.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java@33
PS7, Line 33: public final class Registry {
Same argument that I made with Exception, though I'm a little more forgiving 
here since it is unlikely (but not impossible) that there are 2 Registry 
classes.

I think about this whenever I have to include "Table" in certain places and it 
is used at both a high level, and a metastore level which leads to really 
annoying full path package code if I have to include both Table classes.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/Registry.java@39
PS7, Line 39:   private final Map<String, RelNode> ctes = new HashMap<>();
Nit: ctes_


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@35
PS7, Line 35:     if (name == null || name.isEmpty()) {
Optional: StringUtils.isEmpty()


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@36
PS7, Line 36:       return (query, conf) -> Collections.emptyList();
My java brain is fuzzing out here.  Does this compile?


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@27
PS10, Line 27:   public static final String CTE_THRESHOLD = 
"impala.cte_threshold";
Is the threshold used?

Is this meant for a safety valve or a future check?


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@86
PS7, Line 86:   public void setProducer(ImpalaCTEProducer producer) {
Does anyone call this?  I would like to avoid making this a mutable class


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@36
PS10, Line 36: public class ImpalaCTEConsumer extends TableScan implements 
ImpalaPlanRel {
> I based this on a TableScan because I figured RelOptMaterialization in http
SJC Placeholder:  Check if RelOptMaterialization.qualifiedTableName is 
used...how is digest being used as well.


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@74
PS10, Line 74:         producer_.getRowType(), producer_.getChild(), getName(), 
producer_.nextIndex());
> This mutates the producer, but couldn't figure out a better way to provide
So the general mechanism I've tried to use to avoid mutation is to use 
ParentPlanRelContext to pass information down the RelNode tree and 
NodeWithExprs (which prolly needs a better name) to pass things up through the 
tree.

We would probably need both of these classes and keep a Map<Producer, Integer> 
which gets copied.

This is the way I would code it at least.  We can discuss if you think this 
would get too cumbersome, but I always prefer the non-mutation route.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java@33
PS7, Line 33:   private final String name_;
We really should avoid making this a mutable class.  Is this necessary?


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEProducer.java@64
PS7, Line 64:   @Override
Same as above:  Does anyone call "getChild()"?  I might have missed it.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@40
PS7, Line 40:
I made some comments below, but this might be dangerous to have.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@71
PS7, Line 71:   public double estimateRowCount(RelMetadataQuery mq) {
Is this because we are deriving from AbstractRelNode which requires this?

I don't think this would be called.  I think we should throw an exception here 
unless you've seen this called.  I'd rather not rely on the inputs here since 
other classes haven't implemented this.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@81
PS7, Line 81:   public List<RelNode> getInputs() {
I'm surprised we need this.  Could we derive off of something lower than 
AbstractRelNode which implements this method (if it's not in AbstractRelNode)?


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@87
PS7, Line 87:     inputs_.set(ordinalInParent, p);
Same as above comment.  Keeping track of "inputs_" actually gets weird during 
HepPlanner time since they add those HepRelVertex intermediary nodes


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@203
PS10, Line 203:         consumer.setProducer(producer);
So I don't have this quite mapped out in my head right now, but as I mentioned 
elsewhere, I personally try to avoid any mutation in classes...

Would it be possible to create this information and store in the 
ImpalaSequence?  That seems to be always at the top?  Then maybe we can pass 
some information through the ParentRelNodeContext to the appropriate places?


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@359
PS10, Line 359:       // CTEs currently only supported in SingleNodePlanner.
You mean an executor with a single node here, not the SingleNodePlanner class 
which creates a single node plan but then expands to a plan with more nodes 
later.


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@438
PS10, Line 438:     optCluster.invalidateMetadataQuery();
What is the particular reason we need to invalidate?

I'm not sure what this does within Calcite...my guess is that it clears the 
cache?  Would this be necessary?  I guess I'm not sure.



--
To view, visit http://gerrit.cloudera.org:8080/22091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0840c0859d2fe25628d799a18d302cee1eb36e8
Gerrit-Change-Number: 22091
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Anonymous Coward (816)
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Fri, 28 Nov 2025 02:53:18 +0000
Gerrit-HasComments: Yes

Reply via email to