andygrove opened a new pull request, #3272: URL: https://github.com/apache/datafusion-comet/pull/3272
## Which issue does this PR close? This PR adds support for `native_iceberg_compat` scan implementation with V2 data sources (BatchScanExec with ParquetScan). ## Rationale for this change Previously, V2 Parquet scans always used the legacy `BatchReader` (the `native_comet` approach) regardless of the `COMET_NATIVE_SCAN_IMPL` setting. This was inconsistent with V1 scans which respect the scan impl configuration. This change enables V2 scans to use the DataFusion-based `NativeBatchReader` when `native_iceberg_compat` or `auto` is specified, which is important for deprecating the legacy mutable buffer code. ## What changes are included in this PR? ### New Files - `CometNativeParquetPartitionReaderFactory`: A V2 partition reader factory that uses `NativeBatchReader` (DataFusion-based Parquet reader) - `CometNativeParquetScan`: A V2 scan trait that creates the new reader factory ### Behavior - V2 scans with `auto` or `native_iceberg_compat`: Use `CometNativeParquetScan` (DataFusion-based reader) - V2 scans with `native_comet`: Use existing `CometParquetScan` (legacy JNI-based BatchReader) - V2 scans with `native_datafusion`: Fall back to Spark (not yet supported for V2) ## How are these changes tested? - Updated `CometScanRuleSuite` with tests for V2 scan behavior - Updated `ParquetReadV2Suite` to verify V2 scans work with the new implementation - All existing tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
