Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 )
Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. ...................................................................... Patch Set 6: (8 comments) With this partial review I'm clear from the beginning up to iceberg-delete-sink.cc. http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h File be/src/exec/iceberg-buffered-delete-sink.h: http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h@56 PS6, Line 56: /// Writes the buffered records to position delete files. We could add "after sorting" or "in the correct order". http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h File be/src/exec/iceberg-buffered-delete-sink.h: http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@72 PS5, Line 72: /// Nested iterator class to conveniently iterate over a FilePositions object. > Here this would need another iteration over the map (though only in debug m I feel that it is cleaner to check it here - it is a precondition of this class that there should be no empty vectors. An alternative to the extra loop would be to check that the vector is not empty whenever we call file_level_it_->second.begin(), i.e. on L75 and L103. We could either add DCHECK(!file_level_it_.second.empty()) before these calls or DCHECK(pos_level_it != file_level_it_.second.end()) afterwards. Or, even simpler, we could add DCHECK(pos_level_it != file_level_it_.second.end()) before L84 in Next(), i.e. before dereferencing 'pos_level_it_', with an error message about the emty vector. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h File be/src/exec/iceberg-delete-sink-base.h: http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@30 PS4, Line 30: IcebergDeleteSinkBase > I'm not sure what do you mean by that. It is currently an abstract class as Right, I missed it. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc File be/src/exec/iceberg-delete-sink-base.cc: http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@66 PS4, Line 66: TIcebergPartitionTransformType::type transform_type, const std::string& value, > I don't really find the comment misleading. Subclasses need to check it and Right, I somehow assumed that superclass methods are automatically called like in the case of destructors. http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@112 PS4, Line 112: // non void partition names and transforms. > Yeah, currently it is checked twice when the first overload is being used, Ok, it's indeed a cheap check. http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h File be/src/exec/iceberg-delete-sink.h: http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h@66 PS5, Line 66: /// having the same filepath + position), because that would corrupt the table data > That case is handled in IcebergUpdateImpl.java, so I added the comment ther I'd still consider adding a reference to the comment in IcebergUpdateImpl.java, for example "For a case where deduplication is not possible at the sink level, see the comment in IcebergUpdateImpl::buildAndValidateSelectExprs()". http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc File be/src/exec/iceberg-delete-sink.cc: http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@79 PS5, Line 79: VerifyRowsNotDuplicated Can we add DCHECKs for (prev_file_path_, filepath) and (prev_position_, position) to check that the data is already sorted? http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@121 PS5, Line 121: RETURN_IF_ERROR(ConstructPartitionInfo(row, current_partition_.first.get())); What is the reason for taking this out of InitOutputPartition()? -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 12 Dec 2023 14:15:48 +0000 Gerrit-HasComments: Yes
