Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
......................................................................


Patch Set 7:

(16 comments)

Another batch. I'm almost done with the Analyzer part. Will take another look 
at MergeInsert and then will move to the Planner.

http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@24
PS6, Line 24:
> Done
This doesn't seem done


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeCase.java
File fe/src/main/java/org/apache/impala/analysis/MergeCase.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@28
PS7, Line 28: parent_
> nit: I think naming this mergeStmt_ would be more descriptive than 'parent_
Anyway, I checked the usages for parent_ and apparently we query table names 
and columns from MergeStmt. This for me violates encapsulation for MergeCase 
classes and I'd think it would be a better approach to not pass MergeStmt into 
MergeCase and pass the necessary attributes only.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@33
PS7, Line 33:   protected MergeCase() { filterExprs_ = Collections.emptyList(); 
}
With this implementation we rely on the inherited classes to initialize 
resultExprs_. This could be unsafe if someone in the future adds code to this 
base class and assume that similarly to filterExprs_, resultExprs_ is also 
non-null.
I'd rather initialize as an empty list here. Maybe wa can do precondition check 
before the usages to verify this is not null.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java
File fe/src/main/java/org/apache/impala/analysis/MergeInsert.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@33
PS7, Line 33: public class MergeInsert extends MergeCase {
comments pls, with examples for better understanding


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@35
PS7, Line 35:   /**
nit: I think regular, single line comment format is fine for member variables.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@49
PS7, Line 49:
nit: extra line


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@65
PS7, Line 65:     for (Expr expr : resultExprs_) {
nit: fits into single line


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@82
PS7, Line 82:     return new MergeInsert(parent_, resultExprs_, 
getFilterExprs());
This clone doesn't actually clone the internal state of this class just for the 
superclass. I find this misleading because even though we do a clone we 
actually lose internal information. At this point of time you know where the 
clone() fn is actually invoked and you know that some of the members won't be 
needed from that point on, but it is not guaranteed that in the future cloning 
won't be called elsewhere. I think this isn't a futureproof approach.

I'd prefer actual clone where we populate everything.

Also, see other clone() implementations in MergeCase subclasses.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@102
PS7, Line 102:     Set<String> duplicateFilteredColumnPermutation = new 
LinkedHashSet<>(
I think duplicate check could be achieved easier than this. You just create an 
empty HashSet, start iterating through columnPermutation_, keep adding the col 
names to the HashSet and once you find a duplicate you throw an exception. I 
think it's enough to return an error with the first duplicate found, no need to 
gather all of them.
Not to mention that here you do many iterations on these columns by 
pre-deduplicating, then copying, then removing. This could be achieved in one 
iteration even if you want to collect all the dups.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@127
PS7, Line 127:     for (int i = 0; i < columns.size(); ++i) {
Is the index 'i' used anywhere? Can this be a foreach on 'columns'?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@225
PS7, Line 225:   public static class ColumnMatch {
This inner class is for internal use, right? Can be private/protected then. 
Same for ColumnMatches.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@227
PS7, Line 227:     private final Column targetTableColumn;
nit: '_' suffix


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@242
PS7, Line 242:     public MatchType getMatchType() {
Since this is an internal class for internal use, I wouldn't bother with these 
getters and would make the members public. Similarly as you did with 
ColumnMatches.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@272
PS7, Line 272:     public ColumnMatches() {
nit: fits into single line


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@277
PS7, Line 277:       return list.stream().allMatch(columnMatch -> 
columnMatch.getMatchType()
nit: I'd break this like:
list.stream()
    .allMatch(...)
    .equals()


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java
File fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java@89
PS7, Line 89:     } catch (AnalysisException e) {
Are we sure that the only case where we catch an AnalysisException is when the 
path is ambiguous? Can't we hide other exceptions with this catch block?



--
To view, visit http://gerrit.cloudera.org:8080/21423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268
Gerrit-Change-Number: 21423
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 17 Jun 2024 16:01:37 +0000
Gerrit-HasComments: Yes

Reply via email to