Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
......................................................................


Patch Set 7:

(18 comments)

I managed to finish the first iteration on the code. Thanks for all this 
effort, Peter!

http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h
File be/src/exec/iceberg-merge-node.h:

http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@61
PS7, Line 61:   ScalarExpr* row_present_ = nullptr;
nit: this section below would be more readable with linebreaks included.


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@103
PS7, Line 103:   Status EvaluateCases(RowBatch* output_batch);
nit: without linebreaks this part of the class lacks any structure and it's not 
that easy to read.


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@148
PS7, Line 148:   std::vector<ScalarExpr*> filter_conjuncts_;
nit: I believe that the order of the members is constants first, then functions 
then variables


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@177
PS7, Line 177:     return TIcebergMergeSinkAction::BOTH;
DCHECK for type_ == BOTH?


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc
File be/src/exec/iceberg-merge-node.cc:

http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@43
PS7, Line 43:   auto& tmerge_cases = tnode.merge_node.cases;
No need to create a variable for this, only used once. We can use the 
tnode.merge_node.cases directly in the for loop.


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@48
PS7, Line 48:     RETURN_IF_ERROR(merge_case_plan->Init(tmerge_case, state, 
row_descriptor_));
Since IcebergMergeCasePlane is not an inherited class, I think we can get rid 
of the Init() fn and use the constructor for initialization. Additionally, we 
might be able to make the member variables const if populated in the 
constructor.


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@67
PS7, Line 67: RowDescriptor* row_desc
row_desc seems a simple 'in' parameter. should be const ref


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@92
PS7, Line 92:   DCHECK(pnode.merge_action_tuple_id_ != -1);
nit: some linebreaks would be nice to increase readability with the structure.


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@135
PS7, Line 135:   RETURN_IF_ERROR(child(0)->Open(state));
Apparnetly, in Preapre() we didn't have to call Prepare for the child, bit in 
Open we do have to delegate it to the child. Could you help me understand, why?


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@206
PS7, Line 206: { continue; }
nit: no need for the braces


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@217
PS7, Line 217:   TupleRow* dst_row = output_batch->GetRow(dst_row_idx);
You can merge this line with the one above


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@226
PS7, Line 226:   TIcebergMergeSinkAction::type action = 
merge_case->SinkAction();
You declare this too early. Even no need for a variable since the expression is 
short enough, you can use 'merge_case->SinkAction()' at L242.


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc
File be/src/exec/iceberg-merge-sink.cc:

http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@43
PS7, Line 43:   DCHECK(tsink.child_data_sinks[1].table_sink.action == 
TSinkAction::DELETE);
nit: linebreak after the last DCHECK pls


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@48
PS7, Line 48:   const TDataSink& delete_sink = tsink.child_data_sinks[1];
nit: linebreak before this line to separate the code between insert_sink and 
delete_sink


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@55
PS7, Line 55:   auto merge_action = tsink.output_exprs[0];
this could be merged into the line below. it'd still fit into one line


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@81
PS7, Line 81:   profile()->AddChild(insert_sink_->profile());
I recall that in some use-cases it's not neccessary to create both sinks. 
Should we check for nullness here for the sinks? I see other places too where 
we call functions on the sink pointers without checking for nullness.


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@87
PS7, Line 87:   merge_action_evaluator_ = output_expr_evals_[0];
we could do this in the constructor


http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@108
PS7, Line 108: auto index = delete_rows.AddRow();
             :       auto output_row = delete_rows.GetRow(index);
             :       delete_rows.CopyRow(row, output_row);
             :       delete_rows.CommitLastRow();
this block is identical to the one for the DATA case. Could you move them to a 
function?



--
To view, visit http://gerrit.cloudera.org:8080/21423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268
Gerrit-Change-Number: 21423
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 19 Jun 2024 14:28:12 +0000
Gerrit-HasComments: Yes

Reply via email to