Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/23613 )
Change subject: IMPALA-14092 Part2: Support querying of paimon data table via JNI ...................................................................... Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23613/5//COMMIT_MSG@32 PS5, Line 32: To minimize the overhead, we refashioned the implementation, : the PaimonJniScanner will convert the paimon row batches to : arrow recordbatch, which stores data in offheap region of : impala JVM. And PaimonJniScanner will pass the arrow offheap : record batch memory pointer to the BE backend. : BE PaimonJniScanNode will directly read data from JVM offheap : region, and convert the arrow record batch to impala row batch. > It is a very good question, currently, allocation of offheap memory is not Has follow up question in latest patch set. http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc File be/src/exec/paimon/paimon-jni-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@63 PS8, Line 63: Batche nit: Batches http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@145 PS8, Line 145: OffheapTrackFree(); : OffheapTrackAllocation(offheap_consumed_bytes); Can this be implemented as increment rather than release and reclaim? The reason is, MemTracker might not able to reclaim if the portion of memory that it just freed is claimed by other PlanNode. Only minimum reservation is guaranteed to be available to claim at the very beginning. What happen if MemTracker is not able to increase its memory usage? This code seems to allocate Arrow batch first then increase MemTracker. Usually, it is the other way around. Can we find out the size of next arrow batch before pulling it from GetNextBatchDirect? http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java File fe/src/main/java/org/apache/impala/catalog/Column.java: http://gerrit.cloudera.org:8080/#/c/23613/5/fe/src/main/java/org/apache/impala/catalog/Column.java@172 PS5, Line 172: > currently, need to reuse the field_id in IcebergStructField, to change this I see you added PaimonStructField. Done. http://gerrit.cloudera.org:8080/#/c/23613/8/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java File fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java: http://gerrit.cloudera.org:8080/#/c/23613/8/fe/src/main/java/org/apache/impala/planner/PaimonScanNode.java@294 PS8, Line 294: setMinMemReservationBytes(512) Why minimum is 512 here? Should this be equal to 1 arrow batch byte size? http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/23613/5/testdata/datasets/functional/functional_schema_template.sql@4845 PS5, Line 4845: paimon_primitive_alltypes > Done Done http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/23613/5/tests/query_test/test_scanners.py@2040 PS5, Line 2040: f _test_limit(self, vector): > Done Done -- To view, visit http://gerrit.cloudera.org:8080/23613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Gerrit-Change-Number: 23613 Gerrit-PatchSet: 8 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Mon, 17 Nov 2025 04:55:25 +0000 Gerrit-HasComments: Yes
