Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )

Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG@32
PS5, Line 32:   To minimize the overhead, we refashioned the implementation,
            :   the PaimonJniScanner will convert the paimon row batches to
            :   arrow recordbatch, which stores data in offheap region of
            :   impala JVM. And PaimonJniScanner will pass the arrow offheap
            :   record batch memory pointer to the BE backend.
            :   BE PaimonJniScanNode will directly read data from JVM offheap
            :   region, and convert the arrow record batch to impala row batch.
> It is a very good question, currently, allocation of offheap memory is not
Has follow up question in latest patch set.


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc
File be/src/exec/paimon/paimon-jni-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@63
PS8, Line 63: Batche
nit: Batches


http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@145
PS8, Line 145:       OffheapTrackFree();
             :       OffheapTrackAllocation(offheap_consumed_bytes);
Can this be implemented as increment rather than release and reclaim? The 
reason is, MemTracker might not able to reclaim if the portion of memory that 
it just freed is claimed by other PlanNode.

Only minimum reservation is guaranteed to be available to claim at the very 
beginning.
What happen if MemTracker is not able to increase its memory usage?

This code seems to allocate Arrow batch first then increase MemTracker. 
Usually, it is the other way around. Can we find out the size of next arrow 
batch before pulling it from GetNextBatchDirect?


http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java
File fe/src/main/java/org/apache/impala/catalog/Column.java:

http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java@172
PS5, Line 172:
> currently, need to reuse the field_id in IcebergStructField, to change this
I see you added PaimonStructField. Done.


http://gerrit.cloudera.org:8080/#/c/23613/8/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java
File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java:

http://gerrit.cloudera.org:8080/#/c/23613/8/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@294
PS8, Line 294: setMinMemReservationBytes(512)
Why minimum is 512 here? Should this be equal to 1 arrow batch byte size?


http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql@4845
PS5, Line 4845: paimon_primitive_alltypes
> Done
Done


http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py@2040
PS5, Line 2040: f _test_limit(self, vector):
> Done
Done



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 8
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Mon, 17 Nov 2025 04:55:25 +0000
Gerrit-HasComments: Yes

Reply via email to