Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21423 )
Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg tables ...................................................................... Patch Set 11: (22 comments) Took a look at the first half, I'll continue tomorrow. The code looks great, very readable! http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h File be/src/exec/iceberg-merge-node.h: http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@22 PS11, Line 22: #include "exec-node.inline.h" I don't think we need this #include here. http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@70 PS11, Line 70: /// The identifier of the merge action tuple that contains the : /// information whether the output row should be updated, : /// deleted or inserted nit: Comment could fit two lines only. http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@110 PS11, Line 110: TupleRow* previous_row, TupleRow* actual_row Maybe it's safer to and simpler to use Tuple* only that points to the target tuple. http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@134 PS11, Line 134: inline nit: Is 'inline' keyword needed? http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc File be/src/exec/iceberg-merge-node.cc: http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@186 PS6, Line 186: } : last_row_ = row; > It's a tricky situation; it's easy to print the target table's tuple, but i I think the best is option 2: ERROR: Duplicate row found: one target table row matched more than one source row. Target row: (4 4 4 hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b331700000005_831375517_data.0.parq 4), Source row: (4 4 4) (4 4 4) (4 4 4) (50) Option 1 is also acceptable. Option 3 is harder to interpret for the users, that's why I prefer option 2. http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.cc File be/src/exec/iceberg-merge-node.cc: http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.cc@167 PS11, Line 167: nit: +4 indent needed http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.cc@180 PS11, Line 180: auto row_present = row_present_evaluator_->GetTinyIntVal(row); : IcebergMergeCase* selected_case = nullptr; nit: could be moved after duplicate check http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-sink.cc File be/src/exec/iceberg-merge-sink.cc: http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-sink.cc@108 PS11, Line 108: DispatchRow DispatchRow() just adds the row to the row batch, so maybe simply 'AddRow()', or 'AddRowToBatch()' would be a better name. http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup@1020 PS6, Line 1020: table_ref:source > Table refs can be inline views so queries like `merge into target using (se I see, cool! http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@312 PS11, Line 312: { nit: missing space http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java File fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java: http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java@122 PS11, Line 122: : nit: no need for line break http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java File fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java: http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@61 PS11, Line 61: line row/record? http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@122 PS11, Line 122: !mergeStmt_.hasNotMatchedCasesOnly() Hard to read because of the negations. MergeStmt could have a 'hasMacthedCase()' method. http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@245 PS11, Line 245: : nit: missing newline http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@258 PS11, Line 258: / nit: / http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@261 PS11, Line 261: * target.file__position, target.partition__spec__id, : * target.iceberg__partition__serialized, These are only needed if there's a DELETE sink. It's not a problem to always have these in this patch, but maybe in a subsequent patch we could omit these. http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java File fe/src/main/java/org/apache/impala/analysis/MergeCase.java: http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@31 PS11, Line 31: expressions, for example: 'WHEN MATCHED AND : * source.id > 100' maps 'source.id > 100' expression as an additional filter : * expression nit: To help readability, here 'predicate' might be a better word than 'expression'. These would clearly separate filtering predicates from result expressions for the reader. http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@36 PS11, Line 36: public abstract class MergeCase extends StatementBase { We usually have a reset() method to reset the members that are modified during analyze. We don't have it here, and I wonder if the lack of reset() would cause any problems that we don't see now. http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@69 PS11, Line 69: for (Expr expr : filterExprs_) expr.analyze(analyzer); Don't we need to analyze the resultExprs_? http://gerrit.cloudera.org:8080/#/c/21423/6/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test: http://gerrit.cloudera.org:8080/#/c/21423/6/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test@13 PS6, Line 13: -size=108B cardinality=40 > The merge node waits for the full row description at the evaluation, curren I'm OK with handling this separately. http://gerrit.cloudera.org:8080/#/c/21423/7/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test: http://gerrit.cloudera.org:8080/#/c/21423/7/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test@227 PS7, Line 227: I don't think we should include 'target.action' in the partitioning of the EXCHANGE http://gerrit.cloudera.org:8080/#/c/21423/11/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test: http://gerrit.cloudera.org:8080/#/c/21423/11/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test@219 PS11, Line 219: target.action We discussed it offline: we shouldn't shuffle the rows based on 'target.action', only based on the partition fields of the table. -- To view, visit http://gerrit.cloudera.org:8080/21423 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268 Gerrit-Change-Number: 21423 Gerrit-PatchSet: 11 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 04 Jul 2024 15:54:57 +0000 Gerrit-HasComments: Yes
