Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21423 )
Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg tables ...................................................................... Patch Set 7: (18 comments) I managed to finish the first iteration on the code. Thanks for all this effort, Peter! http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h File be/src/exec/iceberg-merge-node.h: http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@61 PS7, Line 61: ScalarExpr* row_present_ = nullptr; nit: this section below would be more readable with linebreaks included. http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@103 PS7, Line 103: Status EvaluateCases(RowBatch* output_batch); nit: without linebreaks this part of the class lacks any structure and it's not that easy to read. http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@148 PS7, Line 148: std::vector<ScalarExpr*> filter_conjuncts_; nit: I believe that the order of the members is constants first, then functions then variables http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.h@177 PS7, Line 177: return TIcebergMergeSinkAction::BOTH; DCHECK for type_ == BOTH? http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc File be/src/exec/iceberg-merge-node.cc: http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@43 PS7, Line 43: auto& tmerge_cases = tnode.merge_node.cases; No need to create a variable for this, only used once. We can use the tnode.merge_node.cases directly in the for loop. http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@48 PS7, Line 48: RETURN_IF_ERROR(merge_case_plan->Init(tmerge_case, state, row_descriptor_)); Since IcebergMergeCasePlane is not an inherited class, I think we can get rid of the Init() fn and use the constructor for initialization. Additionally, we might be able to make the member variables const if populated in the constructor. http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@67 PS7, Line 67: RowDescriptor* row_desc row_desc seems a simple 'in' parameter. should be const ref http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@92 PS7, Line 92: DCHECK(pnode.merge_action_tuple_id_ != -1); nit: some linebreaks would be nice to increase readability with the structure. http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@135 PS7, Line 135: RETURN_IF_ERROR(child(0)->Open(state)); Apparnetly, in Preapre() we didn't have to call Prepare for the child, bit in Open we do have to delegate it to the child. Could you help me understand, why? http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@206 PS7, Line 206: { continue; } nit: no need for the braces http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@217 PS7, Line 217: TupleRow* dst_row = output_batch->GetRow(dst_row_idx); You can merge this line with the one above http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-node.cc@226 PS7, Line 226: TIcebergMergeSinkAction::type action = merge_case->SinkAction(); You declare this too early. Even no need for a variable since the expression is short enough, you can use 'merge_case->SinkAction()' at L242. http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc File be/src/exec/iceberg-merge-sink.cc: http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@43 PS7, Line 43: DCHECK(tsink.child_data_sinks[1].table_sink.action == TSinkAction::DELETE); nit: linebreak after the last DCHECK pls http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@48 PS7, Line 48: const TDataSink& delete_sink = tsink.child_data_sinks[1]; nit: linebreak before this line to separate the code between insert_sink and delete_sink http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@55 PS7, Line 55: auto merge_action = tsink.output_exprs[0]; this could be merged into the line below. it'd still fit into one line http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@81 PS7, Line 81: profile()->AddChild(insert_sink_->profile()); I recall that in some use-cases it's not neccessary to create both sinks. Should we check for nullness here for the sinks? I see other places too where we call functions on the sink pointers without checking for nullness. http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@87 PS7, Line 87: merge_action_evaluator_ = output_expr_evals_[0]; we could do this in the constructor http://gerrit.cloudera.org:8080/#/c/21423/7/be/src/exec/iceberg-merge-sink.cc@108 PS7, Line 108: auto index = delete_rows.AddRow(); : auto output_row = delete_rows.GetRow(index); : delete_rows.CopyRow(row, output_row); : delete_rows.CommitLastRow(); this block is identical to the one for the DATA case. Could you move them to a function? -- To view, visit http://gerrit.cloudera.org:8080/21423 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268 Gerrit-Change-Number: 21423 Gerrit-PatchSet: 7 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 19 Jun 2024 14:28:12 +0000 Gerrit-HasComments: Yes
