Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
......................................................................


Patch Set 7:

(25 comments)

Let me flush another batch of comments now. I'm somewhere halfway through the 
analyzer changes.

http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup@1027
PS6, Line 1027:     List<MergeCase> cases = Lists.newArrayList(merge_case);
nit: I think here we can merge these 2 rows


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@60
PS7, Line 60: public class IcebergMergeImpl extends MergeImpl {
As I wrote in MergeStmt, I'm not sure I'm convinced that we need a separate 
abstraction for the implementation. One reason is that we don't plan to add any 
other implementations and even if we did, the coding should hppen the other way 
around: when we add the other Merge impl then as a separate commit we could do 
the extra level of abstracion then as a refactor.

The other thing I find myself uncomfortable with is that this IcebergMergeImpl 
now uses all the inner members of MergeStmt, even the MergeStmt itself is 
passed to the constructor. For me this asks for Inheritance rather than an impl 
member.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@93
PS7, Line 93: if (mergeStmt_.isOnlyMatchedCases()) {
            :       sourceTableRef_.setJoinOp(JoinOperator.INNER_JOIN);
            :     } else {
            :       sourceTableRef_.setJoinOp(JoinOperator.FULL_OUTER_JOIN);
            :     }
            :     sourceTableRef_.setOnClause(on_);
            :     sourceTableRef_.setLeftTblRef(targetTableRef_);
This could go into a setJoinParams() function


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@120
PS7, Line 120:     icebergPositionalDeleteTable_ = new 
IcebergPositionDeleteTable(
If you have only a WHEN NOT NATCHED THEN INSERT case do you still have to 
create a pos del table?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@127
PS7, Line 127:     for (MergeCase mergeCase : mergeStmt_.getCases()) {
Since all these merge cases live inside MergeStmt, shouldn't this code live in 
MergeStmt.analyze()?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@131
PS7, Line 131:     analyzer.registerPrivReq(
Here we require ALL privs on the target table. Shouldn't we also expect SELECT 
privs on the source table?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@138
PS7, Line 138:   private void addMergeActionTuple(Analyzer analyzer) {
Function comment pls


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@140
PS7, Line 140:         
analyzer.getDescTbl().createTupleDescriptor(MERGE_ACTION_TUPLE_NAME);
What is a merge action and why is a tuple needed for it? Could you comment it 
on the class?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@153
PS7, Line 153:   public void substituteResultExprs(ExprSubstitutionMap smap,
It stinks for me that MergeStmt also has a fn with similar name but here we 
inherit this not from there but from another interface, MergeImpl. For me this 
also indicates that this class should be a child of MergeStmt


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@186
PS7, Line 186:   public PlanNode getPlanNode(PlannerContext ctx, PlanNode 
parent,
Creating new PlanNodes should be the responsibility of the Planner, in my 
opinion. An Analysis class in the Analysis package shouldn't know how a 
corresponding PlanNode is created.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@204
PS7, Line 204:   public DataSink createDataSink() {
I think this should be an override. Also true to some other functions in this 
class.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java
File fe/src/main/java/org/apache/impala/analysis/MergeImpl.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java@20
PS7, Line 20:     public abstract QueryStmt prepareQuery();
Since this serves as an interface for the Merge implementations (about if we 
should have such an abstraction see my other comment in MergeStmt) please add a 
comment for these functions, what the implementations are expected to implement.

If we decide to keep this abstraction, I think prepareQuery() shouldn't be part 
of the interface. The reason I say this is that an interface should be as clean 
and understandable as possible and for me something like a prepareQuery() 
violates both.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java@24
PS7, Line 24:                                                Analyzer analyzer);
nit: indentation


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java@32
PS7, Line 32:     public TupleDescriptor getMergeActionTuple(){
If we wan't to make this a general abstraction between all other Merge 
implementations then it's not guaranteed that the other impls will have a 
MergeActionTuple. Seems to specific to include it in a general interface.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java@33
PS7, Line 33:         return mergeActionTuple_;
this and below could fit into one line


http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java
File fe/src/main/java/org/apache/impala/analysis/MergeStmt.java:

http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@38
PS6, Line 38:
nit: onClause?


http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@38
PS6, Line 38:  impl_;
nit: make the name of these 2 variables consistent with each other


http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@78
PS6, Line 78: blRefs.add(sourceTableRef_);
> This condition is required to avoid the case where an inline view is provid
analyze() already checks for that, right? here it could be a precondition then


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java
File fe/src/main/java/org/apache/impala/analysis/MergeStmt.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@32
PS7, Line 32: public class MergeStmt extends DmlStatementBase {
I really miss the comments from this class explain the purpose of things.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@38
PS7, Line 38:   private MergeImpl impl_;
Is this detaching the implementation thing was to follow the architecture of 
Update? Doesn't have a strong opinion about this but I kind of feel that it is 
an overkill here sinec we only have an Iceberg implementation and not planning 
to do others if I'm not mistaken.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@122
PS7, Line 122:     return targetTableRef_;
Some of these could fit into a single line.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@129
PS7, Line 129:   public List<MergeCase> getCases() {
nit: add new line above


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@130
PS7, Line 130:     return cases_;
nit: function fits into single line.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeStmt.java@133
PS7, Line 133: isOnlyMatchedCases
nit: hasMatchedCasesOnly?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@1295
PS7, Line 1295:   public static class PartitionExprBundle {
I personally don't like this new PartitionExprBundle class. First, the 'Bundle' 
in the name doesn't mean anything, it could be either PartitionExprStuff, or 
PartitionExprThings either.

If I'm not mistaken this is introduced for a refactor so that 
populatePartitionExprs() could return a single object instead of having 2 out 
parameters. Is this right?

The original approach was preferrable for me because the caller was forced to 
provide some params for the desired data, but with the new approach you get a 
'Bundle' that you don't know what it contains internally.

Not to mention that (in case this is a pure refactor) with a patch of this size 
including refactors will bring extra noise for the reader.



--
To view, visit http://gerrit.cloudera.org:8080/21423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268
Gerrit-Change-Number: 21423
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 13 Jun 2024 14:17:08 +0000
Gerrit-HasComments: Yes

Reply via email to