This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 5f06f4743 IMPALA-13934: Do quick pointer comparison in 
IcebergDeleteBuilder
5f06f4743 is described below

commit 5f06f4743007bda13d1d45c2a16adab472e7ba23
Author: Zoltan Borok-Nagy <[email protected]>
AuthorDate: Thu Apr 3 21:12:51 2025 +0200

    IMPALA-13934: Do quick pointer comparison in IcebergDeleteBuilder
    
    Since IMPALA-13194 file paths are deduplicated in the serialized
    position delete records. Therefore we can do a quick check pointer-based
    comparison of subsequent position delete records instead of the costly
    string compare.
    
    If the pointers don't match we still need to check the strings for
    equality because position records coming from different senders can be
    coalesced into a single row batch by the EXCHANGE RECEIVER.
    
    Measurements
    
    Data table had ~1 Trillion data records and ~68 Billion position delete
    records. Average time spent in the IcebergDeleteBuilder:
    +------------+----------+-----------+
    | Node count | Original | Optimized |
    +------------+----------+-----------+
    |          5 | 12m11s   | 9m47s     |
    |         10 | 6m2s     | 5m        |
    |         20 | 3m1s     | 2m30s     |
    |         40 | 1m30s    | 1m15s     |
    +------------+----------+-----------+
    
    It's essential to optimize the builder as it blocks all the probe
    threads of the IcebergDeleteNode.
    
    Testing
     * no behaviour change, existing tests can be used
    
    Change-Id: Ie171f912a5518b6e6a445efba9d39748ecec5a36
    Reviewed-on: http://gerrit.cloudera.org:8080/22737
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 be/src/exec/iceberg-delete-builder.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/exec/iceberg-delete-builder.cc 
b/be/src/exec/iceberg-delete-builder.cc
index fc4833ace..510d350fc 100644
--- a/be/src/exec/iceberg-delete-builder.cc
+++ b/be/src/exec/iceberg-delete-builder.cc
@@ -298,7 +298,7 @@ Status 
IcebergDeleteBuilder::ProcessBuildBatch(RuntimeState* state,
     file_path = build_row->GetTuple(0)->GetStringSlot(file_path_offset_);
     pos = *build_row->GetTuple(0)->GetBigIntSlot(pos_offset_);
 
-    if (*file_path == prev_file_path) {
+    if (file_path->Ptr() == prev_file_path.Ptr() || *file_path == 
prev_file_path) {
       pos_buffer.push_back(pos);
     } else {
       RETURN_IF_ERROR(AddToDeletedRows(prev_file_path, pos_buffer));

Reply via email to