Peter Rozsa has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/24143 )

Change subject: IMPALA-14755: (part 2) Impement Iceberg deletion vector 
reading/writing
......................................................................

IMPALA-14755: (part 2) Impement Iceberg deletion vector reading/writing

This is the second part of a multi-part implementation adding support
for Iceberg deletion vectors stored in Puffin files. This commit wires
the Puffin reader/writer infrastructure from part 1 into the query
execution pipeline and catalog layer, enabling DELETE on Iceberg V3
tables using deletion vectors.

Puffin file writing is partition-scoped: the delete sink creates one
Puffin file per output partition, and each blob inside that file is a
serialised RoaringBitmap64 covering exactly the deleted row positions
of one data file in that partition. When a data file already has a
deletion vector from a previous DELETE, the existing bitmap is and OR-ed
with the new one before the merged blob is written, so each Puffin file
always holds the complete, up-to-date set of deleted positions for its
partition.

DELETE on a V3 table is blocked at analysis time if the table has any
existing V2 position- or equality-delete files. The table must first be
compacted with OPTIMIZE TABLE to remove those files before DELETE can be
used on them.

Testing:
 - iceberg-v3-delete.test and iceberg-v3-delete-partition-sort.test
   added.
 - Manually validated that deletion vectors written by Spark can be read
   from Impala and deletion vectors written by Impala can be read from
   Spark.

Change-Id: I5613c31a7aa46b94b7c70386c939c08cc68632cd
---
M be/src/exec/blob-reader.h
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/iceberg-delete-sink-base.cc
M be/src/exec/iceberg-delete-sink-base.h
M be/src/exec/iceberg-delete-sink-config.cc
M be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/puffin/puffin-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/scheduling/scheduler.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M common/fbs/IcebergObjects.fbs
M fe/src/main/java/org/apache/impala/analysis/IcebergDeleteImpl.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
R fe/src/main/java/org/apache/impala/planner/IcebergDeleteJoinNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-delete-partition-sort.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-delete.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-negative.test
M tests/query_test/test_iceberg.py
32 files changed, 1,957 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/24143/3
--
To view, visit http://gerrit.cloudera.org:8080/24143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5613c31a7aa46b94b7c70386c939c08cc68632cd
Gerrit-Change-Number: 24143
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to