Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21423 )
Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg tables ...................................................................... Patch Set 7: (16 comments) Another batch. I'm almost done with the Analyzer part. Will take another look at MergeInsert and then will move to the Planner. http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@24 PS6, Line 24: > Done This doesn't seem done http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeCase.java File fe/src/main/java/org/apache/impala/analysis/MergeCase.java: http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@28 PS7, Line 28: parent_ > nit: I think naming this mergeStmt_ would be more descriptive than 'parent_ Anyway, I checked the usages for parent_ and apparently we query table names and columns from MergeStmt. This for me violates encapsulation for MergeCase classes and I'd think it would be a better approach to not pass MergeStmt into MergeCase and pass the necessary attributes only. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@33 PS7, Line 33: protected MergeCase() { filterExprs_ = Collections.emptyList(); } With this implementation we rely on the inherited classes to initialize resultExprs_. This could be unsafe if someone in the future adds code to this base class and assume that similarly to filterExprs_, resultExprs_ is also non-null. I'd rather initialize as an empty list here. Maybe wa can do precondition check before the usages to verify this is not null. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java File fe/src/main/java/org/apache/impala/analysis/MergeInsert.java: http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@33 PS7, Line 33: public class MergeInsert extends MergeCase { comments pls, with examples for better understanding http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@35 PS7, Line 35: /** nit: I think regular, single line comment format is fine for member variables. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@49 PS7, Line 49: nit: extra line http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@65 PS7, Line 65: for (Expr expr : resultExprs_) { nit: fits into single line http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@82 PS7, Line 82: return new MergeInsert(parent_, resultExprs_, getFilterExprs()); This clone doesn't actually clone the internal state of this class just for the superclass. I find this misleading because even though we do a clone we actually lose internal information. At this point of time you know where the clone() fn is actually invoked and you know that some of the members won't be needed from that point on, but it is not guaranteed that in the future cloning won't be called elsewhere. I think this isn't a futureproof approach. I'd prefer actual clone where we populate everything. Also, see other clone() implementations in MergeCase subclasses. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@102 PS7, Line 102: Set<String> duplicateFilteredColumnPermutation = new LinkedHashSet<>( I think duplicate check could be achieved easier than this. You just create an empty HashSet, start iterating through columnPermutation_, keep adding the col names to the HashSet and once you find a duplicate you throw an exception. I think it's enough to return an error with the first duplicate found, no need to gather all of them. Not to mention that here you do many iterations on these columns by pre-deduplicating, then copying, then removing. This could be achieved in one iteration even if you want to collect all the dups. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@127 PS7, Line 127: for (int i = 0; i < columns.size(); ++i) { Is the index 'i' used anywhere? Can this be a foreach on 'columns'? http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@225 PS7, Line 225: public static class ColumnMatch { This inner class is for internal use, right? Can be private/protected then. Same for ColumnMatches. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@227 PS7, Line 227: private final Column targetTableColumn; nit: '_' suffix http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@242 PS7, Line 242: public MatchType getMatchType() { Since this is an internal class for internal use, I wouldn't bother with these getters and would make the members public. Similarly as you did with ColumnMatches. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@272 PS7, Line 272: public ColumnMatches() { nit: fits into single line http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeInsert.java@277 PS7, Line 277: return list.stream().allMatch(columnMatch -> columnMatch.getMatchType() nit: I'd break this like: list.stream() .allMatch(...) .equals() http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java File fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java: http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java@89 PS7, Line 89: } catch (AnalysisException e) { Are we sure that the only case where we catch an AnalysisException is when the path is ambiguous? Can't we hide other exceptions with this catch block? -- To view, visit http://gerrit.cloudera.org:8080/21423 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268 Gerrit-Change-Number: 21423 Gerrit-PatchSet: 7 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 17 Jun 2024 16:01:37 +0000 Gerrit-HasComments: Yes
