ji chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/23613 )
Change subject: IMPALA-14092 Part2: Support querying of paimon data table via JNI ...................................................................... Patch Set 9: (18 comments) http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h File be/src/exec/paimon/paimon-jni-row-reader.h: http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@36 PS8, Line 36: > nit: 'an' is redundant Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@43 PS8, Line 43: & > nit: we usually don't put space here Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@48 PS8, Line 48: : > nit: redundant empty lines Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@53 PS8, Line 53: > nit: we put the '*' next to the type Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.h@97 PS8, Line 97: > nit: For WriteStringOrBinarySlot String is mentioned first, then Binary, an Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc File be/src/exec/paimon/paimon-jni-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@64 PS8, Line 64: > nit: needs +4 indent Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@123 PS8, Line 123: > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@148 PS8, Line 148: case TYPE_ST > We could add DCHECK_NOTNULL around the dynamic_cast. Current code might der Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@149 PS8, Line 149: //TODO: implement struct type support later : tuple->SetNull(slot_desc->null_indicator_offset()); : break; > This pattern repeats multiple times, could be moved to a template function Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@171 PS8, Line 171: Array *>(array); > Is it needed? 'v' is already int16_t. Same goes for L179, L187, L195. Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-row-reader.cc@270 PS8, Line 270: > Do we have tests with TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP (WITHOUT TIMESTAMP WITH LOCAL TIME ZONE have tested locally, the result is compared with spark query result, it is the same, since the result is related with local timezone, so the test case is not included in the test suite currently. http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.h File be/src/exec/paimon/paimon-jni-scan-node.h: http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.h@57 PS8, Line 57: Consume > nit: Consume Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc File be/src/exec/paimon/paimon-jni-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@63 PS8, Line 63: Batch" > nit: Batches Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@145 PS8, Line 145: OffheapTrackFree(); : OffheapTrackAllocation(offheap_consumed_bytes); > See the memory tracking best practice here. I performed the following test using one query, the query contains two paimon table, for observation, I intentioanlly allocate 512M offheap object for 1 table, and 1G for another table. below is the process mem usage: Process: Limit=12.00 GB Total=1.05 GB Peak=1.60 GB Buffer Pool: Free Buffers: Total=37.62 MB Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused Reservation: Total=-64.00 MB Control Service Queue: Limit=122.88 MB Total=0 Peak=548.20 KB Data Stream Service Queue: Limit=614.40 MB Total=0 Peak=37.10 KB Data Stream Manager Early RPCs: Total=0 Peak=0 TCMalloc Overhead: Total=31.44 MB WebserverCompressedBuffer: Total=0 Peak=0 RequestPool=fe-eval-exprs: Total=0 Peak=0 RequestPool=default-pool: Total=863.34 MB Peak=1.20 GB Query(bc4a69f6ed5d857f:0aa1ff8f00000000): Reservation=350.00 MB ReservationLimit=9.60 GB OtherMemory=513.33 MB Total=863.34 MB Peak=1.20 GB Unclaimed reservations: Reservation=38.00 MB OtherMemory=0 Total=38.00 MB Peak=106.00 MB Fragment bc4a69f6ed5d857f:0aa1ff8f00000001: Reservation=0 OtherMemory=0 Total=0 Peak=1.00 GB PAIMON_SCAN_NODE (id=1): Total=0 Peak=1.00 GB KrpcDataStreamSender (dst_id=5): Total=0 Peak=66.00 KB RowBatchSerialization: Total=0 Peak=66.00 KB Fragment bc4a69f6ed5d857f:0aa1ff8f00000002: Reservation=312.00 MB OtherMemory=513.25 MB Total=825.25 MB Peak=826.14 MB AGGREGATION_NODE (id=3): Reservation=34.00 MB OtherMemory=1.01 MB Total=35.01 MB Peak=35.01 MB GroupingAggregator 0: Reservation=34.00 MB OtherMemory=21.12 KB Total=34.02 MB Peak=34.02 MB Exprs: Total=21.12 KB Peak=21.12 KB HASH_JOIN_NODE (id=2): Reservation=278.00 MB OtherMemory=133.25 KB Total=278.13 MB Peak=279.02 MB Exprs: Total=13.12 KB Peak=13.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB PAIMON_SCAN_NODE (id=0): Total=512.10 MB Peak=512.10 MB Exprs: Total=4.00 KB Peak=4.00 KB Arrow Batch: Total=512.10 MB Peak=512.10 MB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=288.00 KB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=6): Total=0 Peak=0 RowBatchSerialization: Total=0 Peak=0 Fragment bc4a69f6ed5d857f:0aa1ff8f00000003: Reservation=0 OtherMemory=29.12 KB Total=29.12 KB Peak=29.12 KB SORT_NODE (id=4): Total=0 Peak=0 AGGREGATION_NODE (id=7): Reservation=0 OtherMemory=21.12 KB Total=21.12 KB Peak=21.12 KB GroupingAggregator 0: Total=21.12 KB Peak=21.12 KB Exprs: Total=21.12 KB Peak=21.12 KB EXCHANGE_NODE (id=6): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=8): Total=0 Peak=0 RowBatchSerialization: Total=0 Peak=0 CodeGen: Total=668.00 B Peak=668.00 B Fragment bc4a69f6ed5d857f:0aa1ff8f00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=8): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=13.60 KB Peak=13.60 KB CodeGen: Total=1.96 KB Peak=1.96 KB CodeGen: Total=33.88 KB Peak=33.88 KB Untracked Memory: Total=202.25 MB It is observed the allocated heap is not in the Untracked memory anymore. and the memory is correctly assigned to the ArrowBatch memtracker. So I think it is working as expected, it is not double counting the memory. http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@148 PS8, Line 148: : > nit: too many empty lines Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@162 PS8, Line 162: > nit: redundant empty line Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@209 PS8, Line 209: tNe > nit: row batch Done http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@218 PS8, Line 218: *paimon_arrow_record_batch_holder_, arrow_record_batch_row_index_, tuple_desc_, : tuple, row_batch->tuple_data_pool(), state)); : /// Evaluate conjuncts on this tuple row : if (ExecNode::EvalConjuncts(conjunct_evals().data(), : conjunct_evals().size(), tuple_row)) { : row_batch->CommitLastRow(); : tuple = reinterpret_cast<Tuple*>( : reinterpret_cast<uint8_t*>(tuple) + tuple_desc_->byte_size()); : IncrementNumRowsReturned(1); : } else { > Since we are getting column-oriented data, in the future (definitely not in will update the code in future CR -- To view, visit http://gerrit.cloudera.org:8080/23613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Gerrit-Change-Number: 23613 Gerrit-PatchSet: 9 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Mon, 24 Nov 2025 20:11:24 +0000 Gerrit-HasComments: Yes
