Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24071 )

Change subject: IMPALA-14755:(part 1) Implement Puffin Blob reader and File 
writer
......................................................................


Patch Set 5:

(5 comments)

I changed the dataflow to use hashes everywhere when it's possible, the only 
part that requires real paths is the returning part to the coordinator where we 
must supply read Puffin file paths.

http://gerrit.cloudera.org:8080/#/c/24071/4/be/src/exec/output-partition.h
File be/src/exec/output-partition.h:

http://gerrit.cloudera.org:8080/#/c/24071/4/be/src/exec/output-partition.h@130
PS4, Line 130:
> nit: requires +4 indent
Done


http://gerrit.cloudera.org:8080/#/c/24071/4/be/src/exec/puffin/puffin-writer.cc
File be/src/exec/puffin/puffin-writer.cc:

http://gerrit.cloudera.org:8080/#/c/24071/4/be/src/exec/puffin/puffin-writer.cc@60
PS4, Line 60: (p
> Should we call Close() just in case?
It can cause double closes, I used the pattern from HdfsParquetTableWriter, and 
HdfsTextTableWriter, I could add a IsClosed check here, but it would break the 
scheme used in other writers.


http://gerrit.cloudera.org:8080/#/c/24071/4/be/src/exec/puffin/puffin-writer.cc@137
PS4, Line 137:
> I think we should start this timer after we finished loading the old DVs.
Done


http://gerrit.cloudera.org:8080/#/c/24071/3/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/24071/3/be/src/util/hash-util.h@338
PS3, Line 338:       } else {
             :         temp >>= 1;
             :       }
             :     }
             :     return temp;
             :   }
             :
             :   t
> I think the current version (PS4) is OK, the namespace is annotated.
Ack


http://gerrit.cloudera.org:8080/#/c/24071/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/24071/3/common/thrift/CatalogObjects.thrift@679
PS3, Line 679: THash1
> I think there're two possiblities:
I restored the map to THash128, TIcebergDeletionVector, it's possible to use 
hashes everywhere and only use paths for the created Puffin files.



--
To view, visit http://gerrit.cloudera.org:8080/24071
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I068a071f9db907064ccec8568db5234863eb4587
Gerrit-Change-Number: 24071
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 17 Mar 2026 15:41:06 +0000
Gerrit-HasComments: Yes

Reply via email to