Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
......................................................................


Patch Set 6:

(8 comments)

With this partial review I'm clear from the beginning up to 
iceberg-delete-sink.cc.

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h@56
PS6, Line 56:   /// Writes the buffered records to position delete files.
We could add "after sorting" or "in the correct order".


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@72
PS5, Line 72:   /// Nested iterator class to conveniently iterate over a 
FilePositions object.
> Here this would need another iteration over the map (though only in debug m
I feel that it is cleaner to check it here - it is a precondition of this class 
that there should be no empty vectors.

An alternative to the extra loop would be to check that the vector is not empty 
whenever we call file_level_it_->second.begin(), i.e. on L75 and L103. We could 
either add DCHECK(!file_level_it_.second.empty()) before these calls or 
DCHECK(pos_level_it != file_level_it_.second.end()) afterwards.

Or, even simpler, we could add DCHECK(pos_level_it != 
file_level_it_.second.end()) before L84 in Next(), i.e. before dereferencing 
'pos_level_it_', with an error message about the emty vector.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h
File be/src/exec/iceberg-delete-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@30
PS4, Line 30: IcebergDeleteSinkBase
> I'm not sure what do you mean by that. It is currently an abstract class as
Right, I missed it.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@66
PS4, Line 66:     TIcebergPartitionTransformType::type transform_type, const 
std::string& value,
> I don't really find the comment misleading. Subclasses need to check it and
Right, I somehow assumed that superclass methods are automatically called like 
in the case of destructors.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@112
PS4, Line 112:     // non void partition names and transforms.
> Yeah, currently it is checked twice when the first overload is being used,
Ok, it's indeed a cheap check.


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h
File be/src/exec/iceberg-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h@66
PS5, Line 66:   /// having the same filepath + position), because that would 
corrupt the table data
> That case is handled in IcebergUpdateImpl.java, so I added the comment ther
I'd still consider adding a reference to the comment in IcebergUpdateImpl.java, 
for example
  "For a case where deduplication is not possible at the sink level, see the 
comment in IcebergUpdateImpl::buildAndValidateSelectExprs()".


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc
File be/src/exec/iceberg-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@79
PS5, Line 79: VerifyRowsNotDuplicated
Can we add DCHECKs for (prev_file_path_, filepath) and (prev_position_, 
position) to check that the data is already sorted?


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@121
PS5, Line 121:   RETURN_IF_ERROR(ConstructPartitionInfo(row, 
current_partition_.first.get()));
What is the reason for taking this out of InitOutputPartition()?



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 12 Dec 2023 14:15:48 +0000
Gerrit-HasComments: Yes

Reply via email to