(datafusion-comet) branch main updated: docs: fix inaccurate claim about mutable buffers in parquet scan docs (#3378)

agrove Wed, 04 Feb 2026 04:46:57 -0800

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/main by this push:
     new 7886a3d45 docs: fix inaccurate claim about mutable buffers in parquet 
scan docs (#3378)
7886a3d45 is described below

commit 7886a3d452989008e50a382bef120ca10bb08477
Author: Andy Grove <[email protected]>
AuthorDate: Wed Feb 4 05:46:26 2026 -0700

    docs: fix inaccurate claim about mutable buffers in parquet scan docs 
(#3378)
---
 docs/source/contributor-guide/parquet_scans.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/source/contributor-guide/parquet_scans.md 
b/docs/source/contributor-guide/parquet_scans.md
index 626b65b4e..bbacff4d9 100644
--- a/docs/source/contributor-guide/parquet_scans.md
+++ b/docs/source/contributor-guide/parquet_scans.md
@@ -37,9 +37,13 @@ implementation:
 
 - Leverages the DataFusion community's ongoing improvements to `DataSourceExec`
 - Provides support for reading complex types (structs, arrays, and maps)
-- Removes the use of reusable mutable-buffers in Comet, which is complex to 
maintain
+- Delegates Parquet decoding to native Rust code rather than JVM-side decoding
 - Improves performance
 
+> **Note on mutable buffers:** Both `native_comet` and `native_iceberg_compat` 
use reusable mutable buffers
+> when transferring data from JVM to native code via Arrow FFI. The 
`native_iceberg_compat` implementation uses DataFusion's native Parquet reader 
for data columns, bypassing Comet's mutable buffer infrastructure entirely. 
However, partition columns still use `ConstantColumnReader`, which relies on 
Comet's mutable buffers that are reused across batches. This means native 
operators that buffer data (such as `SortExec` or `ShuffleWriterExec`) must 
perform deep copies to avoid data corruption.
+> See the [FFI documentation](ffi.md) for details on the `arrow_ffi_safe` flag 
and ownership semantics.
+
 The `native_datafusion` and `native_iceberg_compat` scans share the following 
limitations:
 
 - When reading Parquet files written by systems other than Spark that contain 
columns with the logical type `UINT_8`


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-comet) branch main updated: docs: fix inaccurate claim about mutable buffers in parquet scan docs (#3378)

Reply via email to