Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24143 )

Change subject: IMPALA-14755: (part 2) Implement Iceberg deletion vector 
reading/writing
......................................................................


Patch Set 8:

(13 comments)

Thanks, Zoltan

http://gerrit.cloudera.org:8080/#/c/24143/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/24143/4//COMMIT_MSG@25
PS4, Line 25: equality-delete files
> I think this has been resolved. Can we test it?
Done


http://gerrit.cloudera.org:8080/#/c/24143/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/24143/7//COMMIT_MSG@24
PS7, Line 24: rejecte
> nit: rejected?
Done


http://gerrit.cloudera.org:8080/#/c/24143/4/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/24143/4/be/src/exec/iceberg-delete-builder.h@154
PS4, Line 154: ring the buil
> > or we can reconsider enabling V3 deletes even if V2 pos deletes are there
Ack


http://gerrit.cloudera.org:8080/#/c/24143/7/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/24143/7/be/src/exec/iceberg-delete-builder.cc@205
PS7, Line 205:
> dv_loader.fs_cache_ is recreated each time. Maybe we could create an fs_cac
Done


http://gerrit.cloudera.org:8080/#/c/24143/7/be/src/exec/iceberg-delete-builder.cc@192
PS7, Line 192:               file_path_str, content_size));
             :         }
             :         VLOG_FILE << "Found DV for file " << file_path_str << " 
=> "
             :                   << ice_del_vector->path()->str();
             :         SCOPED_TIMER(dv_load_timer_);
             :         RETURN_IF_ERROR(dv_reader_.Load(reader_context_.get(), 
mem_tracker(), &obj_pool_,
             :             ice_del_vector->path()->str(), 
ice_del_vector->content_offset(),
             :             content_size, &bitmap));
             :       }
             :       deleted_rows_.emplace(std::piecewise_construct,
             :           std::forward_as_tuple(ptr_copy, 
file_path_str.length()),
             :           std::forward_as_tuple(std::move(bitmap)));
             :     }
             :   }
             :
             :   return Status::OK();
             : }
             :
             : Status IcebergDeleteBuilder::Prepare(
             :     RuntimeState* state, MemTracker* parent_mem_tracker) {
             :   RETURN_IF_ERROR(DataSink::Prepare(state, parent_mem_tracker));
             :
> nit: this could be moved to the above if stmt's body. Then L184 could be mo
Done


http://gerrit.cloudera.org:8080/#/c/24143/7/be/src/exec/iceberg-delete-sink-base.h
File be/src/exec/iceberg-delete-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/24143/7/be/src/exec/iceberg-delete-sink-base.h@67
PS7, Line 67: const std::map<THash128, TIcebergDeletionVector>* 
referenced_deletion_v
> Can it just be a reference to the SinkConfig's map?
I used a const pointer


http://gerrit.cloudera.org:8080/#/c/24143/7/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/24143/7/be/src/exec/iceberg-delete-sink-base.cc@136
PS7, Line 136: output_partition->referenced_deletion_vectors
> Could it be a pointer/reference?
Done


http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/analysis/IcebergDeleteImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergDeleteImpl.java:

http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/analysis/IcebergDeleteImpl.java@50
PS7, Line 50: >=
> I think this should be >=, so we don't skip this test accidentally when we
Done


http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@90
PS7, Line 90:       // PUFFIN format is used for deletion vectors, which are 
only supported in Iceberg
> I thought we only create IcebergDeleteTable for reading position delete fil
This is needed for writing; the output partition init uses the partition's file 
format for deciding which writer needs to be instantiated, I found this 
approach the most suitable, but I agree that the creation of Iceberg delete 
tables are a bit messy.


http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
File 
fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java:

http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java@71
PS7, Line 71:               // A data file can have at most one DV, but may 
also be covered by legacy
            :               // position-delete files simultaneously (e.g. after 
a partial V3 migration
            :               // by an external engine). Assert there is no 
second DV for this file.
            :               Hash128 dataFileHash = 
IcebergUtil.getFilePathHash(dataFile);
            :               TIcebergDeletionVector dv = 
dataFileToDV.put(dataFileHash,
            :                   
IcebergUtil.createTIcebergDeletionVector(delFile));
            :               Preconditions.checkState(dv == null,
            :                   "More than one deletion vector found for data 
file: %s",
            :                   dataFile.location());
> We could check that dataFileToDV.put() returns null. This way we avoid the
Done


http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@320
PS7, Line 320: dataFileToDV_
> Would it make sense to store a FbIcebergDeletionVector object's ByteBuffer
You mean moving away from TIcebergDeletionVector in the FE reading part? I 
think eventually we can eliminate the Thrift structure  in this case.


http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@740
PS7, Line 740:                 // A data file can have at most one DV, but may 
also be covered by legacy
             :                 // position-delete files simultaneously (e.g. 
after a partial V3 migration
             :                 // by an external engine). Assert there is no 
second DV for this file.
             :                 Hash128 dataFileHash = 
IcebergUtil.getFilePathHash(dataFile.location());
             :                 TIcebergDeletionVector dv = 
dataFileToDV_.put(dataFileHash,
             :                     
IcebergUtil.createTIcebergDeletionVector(delFile));
             :                 Preconditions.checkState(dv == null,
             :                     "More than one deletion vec
> We could check that dataFileToDV.put() returns null. This way we avoid the
Done


http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/24143/7/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@474
PS7, Line 474:         DeleteFile deleteFile = 
createDeletionVector(feIcebergTable, dv);
             :         rowDelta.addDeletes(deleteFile);
> We should add tests where we delete from a table that already has DVs writt
Done



--
To view, visit http://gerrit.cloudera.org:8080/24143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5613c31a7aa46b94b7c70386c939c08cc68632cd
Gerrit-Change-Number: 24143
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 10 Apr 2026 14:29:51 +0000
Gerrit-HasComments: Yes

Reply via email to