Peter Rozsa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/24071
Change subject: IMPALA-14755:(part 1) Implement Puffin Blob reader and File writer ...................................................................... IMPALA-14755:(part 1) Implement Puffin Blob reader and File writer This is the first part of a multi-part implementation adding support for Iceberg deletion vectors stored in Puffin files. This commit introduces the core infrastructure for reading and writing Puffin format files containing deletion vector blobs. This commit adds: - Generic BlobReader template base class for reading blob data from HDFS with specialized DeletionVectorBlobReader for Puffin deletion vectors - PuffinWriter that writes Puffin files with deletion vector blobs, supporting merging of existing and new deletion vectors - Puffin data structures (BlobMetadata, BlobData, File) and serialization - Integration with table sink pipeline via new PUFFIN THdfsFileFormat - Extended OutputPartition with PuffinWriteResult for tracking DV metadata - CRC32 checksums for blob integrity and RoaringBitmap64::Or() for DV merging - Updated Thrift/FlatBuffer schemas for deletion vector metadata Change-Id: I068a071f9db907064ccec8568db5234863eb4587 --- A be/.github/copilot-instructions.md M be/CMakeLists.txt M be/src/exec/CMakeLists.txt A be/src/exec/blob-reader.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/iceberg-delete-builder.cc M be/src/exec/output-partition.h A be/src/exec/puffin/CMakeLists.txt A be/src/exec/puffin/blob.h A be/src/exec/puffin/puffin-blob-reader.cc A be/src/exec/puffin/puffin-blob-reader.h A be/src/exec/puffin/puffin-writer.cc A be/src/exec/puffin/puffin-writer.h M be/src/exec/table-sink-base.cc M be/src/util/hash-util.h M be/src/util/roaring-bitmap-test.cc M be/src/util/roaring-bitmap.h M common/fbs/IcebergObjects.fbs M common/protobuf/planner.proto M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M common/thrift/DataSinks.thrift M common/thrift/PlanNodes.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java 28 files changed, 1,230 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/24071/3 -- To view, visit http://gerrit.cloudera.org:8080/24071 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I068a071f9db907064ccec8568db5234863eb4587 Gerrit-Change-Number: 24071 Gerrit-PatchSet: 3 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
