Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
......................................................................


Patch Set 11:

(22 comments)

Took a look at the first half, I'll continue tomorrow. The code looks great, 
very readable!

http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h
File be/src/exec/iceberg-merge-node.h:

http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@22
PS11, Line 22: #include "exec-node.inline.h"
I don't think we need this #include here.


http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@70
PS11, Line 70:   /// The identifier of the merge action tuple that contains the
             :   /// information whether the output row should be updated,
             :   /// deleted or inserted
nit: Comment could fit two lines only.


http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@110
PS11, Line 110: TupleRow* previous_row, TupleRow* actual_row
Maybe it's safer to and simpler to use Tuple* only that points to the target 
tuple.


http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.h@134
PS11, Line 134: inline
nit: Is 'inline' keyword needed?


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc
File be/src/exec/iceberg-merge-node.cc:

http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@186
PS6, Line 186:     }
             :     last_row_ = row;
> It's a tricky situation; it's easy to print the target table's tuple, but i
I think the best is option 2:

 ERROR: Duplicate row found: one target table row matched more than one source 
row. Target row: (4 4 4 
hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b331700000005_831375517_data.0.parq
 4), Source row: (4 4 4) (4 4 4) (4 4 4) (50)

Option 1 is also acceptable. Option 3 is harder to interpret for the users, 
that's why I prefer option 2.


http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.cc
File be/src/exec/iceberg-merge-node.cc:

http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.cc@167
PS11, Line 167:
nit: +4 indent needed


http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-node.cc@180
PS11, Line 180:     auto row_present = 
row_present_evaluator_->GetTinyIntVal(row);
              :     IcebergMergeCase* selected_case = nullptr;
nit: could be moved after duplicate check


http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-sink.cc
File be/src/exec/iceberg-merge-sink.cc:

http://gerrit.cloudera.org:8080/#/c/21423/11/be/src/exec/iceberg-merge-sink.cc@108
PS11, Line 108: DispatchRow
DispatchRow() just adds the row to the row batch, so maybe simply 'AddRow()', 
or 'AddRowToBatch()' would be a better name.


http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup@1020
PS6, Line 1020: table_ref:source
> Table refs can be inline views so queries like `merge into target using (se
I see, cool!


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@312
PS11, Line 312: {
nit: missing space


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
File fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java:

http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java@122
PS11, Line 122:
              :
nit: no need for line break


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java:

http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@61
PS11, Line 61: line
row/record?


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@122
PS11, Line 122: !mergeStmt_.hasNotMatchedCasesOnly()
Hard to read because of the negations. MergeStmt could have a 
'hasMacthedCase()' method.


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@245
PS11, Line 245:
              :
nit: missing newline


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@258
PS11, Line 258: &#47
nit: /


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@261
PS11, Line 261:    *  target.file__position, target.partition__spec__id,
              :    *  target.iceberg__partition__serialized,
These are only needed if there's a DELETE sink. It's not a problem to always 
have these in this patch, but maybe in a subsequent patch we could omit these.


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java
File fe/src/main/java/org/apache/impala/analysis/MergeCase.java:

http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@31
PS11, Line 31: expressions, for example: 'WHEN MATCHED AND
             :  * source.id > 100' maps 'source.id > 100' expression as an 
additional filter
             :  * expression
nit: To help readability, here 'predicate' might be a better word than 
'expression'. These would clearly separate filtering predicates from result 
expressions for the reader.


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@36
PS11, Line 36: public abstract class MergeCase extends StatementBase {
We usually have a reset() method to reset the members that are modified during 
analyze. We don't have it here, and I wonder if the lack of reset() would cause 
any problems that we don't see now.


http://gerrit.cloudera.org:8080/#/c/21423/11/fe/src/main/java/org/apache/impala/analysis/MergeCase.java@69
PS11, Line 69:     for (Expr expr : filterExprs_) expr.analyze(analyzer);
Don't we need to analyze the resultExprs_?


http://gerrit.cloudera.org:8080/#/c/21423/6/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test:

http://gerrit.cloudera.org:8080/#/c/21423/6/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test@13
PS6, Line 13: -size=108B cardinality=40
> The merge node waits for the full row description at the evaluation, curren
I'm OK with handling this separately.


http://gerrit.cloudera.org:8080/#/c/21423/7/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test:

http://gerrit.cloudera.org:8080/#/c/21423/7/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test@227
PS7, Line 227:
I don't think we should include 'target.action' in the partitioning of the 
EXCHANGE


http://gerrit.cloudera.org:8080/#/c/21423/11/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test:

http://gerrit.cloudera.org:8080/#/c/21423/11/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-merge.test@219
PS11, Line 219: target.action
We discussed it offline: we shouldn't shuffle the rows based on 
'target.action', only based on the partition fields of the table.



--
To view, visit http://gerrit.cloudera.org:8080/21423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268
Gerrit-Change-Number: 21423
Gerrit-PatchSet: 11
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 04 Jul 2024 15:54:57 +0000
Gerrit-HasComments: Yes

Reply via email to