andygrove opened a new pull request, #3416:
URL: https://github.com/apache/datafusion-comet/pull/3416

   ## Summary
   
   Closes #3311.
   
   - Add Spark-compatible schema validation in the native schema adapter, gated 
behind new config `spark.comet.parquet.schemaValidation.enabled` (default: 
`true`)
   - When enabled, the native scan rejects type coercions that Spark's 
vectorized Parquet reader would reject (TimestampLTZ↔TimestampNTZ, 
integer/float widening without schema evolution, string→numeric, etc.)
   - Pass `schema_evolution_enabled` to native side via proto so integer/float 
widening is allowed when Comet's schema evolution config is enabled
   - Native exceptions with schema validation errors are wrapped in 
`SparkException` with compatible error messages
   - Un-ignore 5 Spark SQL tests that now pass with `native_datafusion`
   
   ### Tests un-ignored
   - `ParquetIOSuite`: "SPARK-35640: read binary as timestamp should throw 
schema incompatible error"
   - `ParquetIOSuite`: "SPARK-35640: int as long should throw schema 
incompatible error"
   - `ParquetQuerySuite`: "SPARK-36182: can't read TimestampLTZ as TimestampNTZ"
   - `ParquetQuerySuite`: "row group skipping doesn't overflow when reading 
into larger type"
   - `ParquetFilterSuite`: "SPARK-25207: exception when duplicate fields in 
case-insensitive mode"
   
   ### Tests still ignored (can't match Spark exception types from native)
   - `ParquetSchemaSuite`: "SPARK-45604" and "schema mismatch failure error 
message" — check for `SchemaColumnConvertNotSupportedException`
   - `FileBasedDataSourceSuite`: "caseSensitive" — checks for 
`SparkRuntimeException` with error class
   - `ParquetQuerySuite`: "SPARK-34212" — different decimal handling
   
   ## Test plan
   - [x] Rust native build passes
   - [x] Rust unit tests pass (schema_adapter tests)
   - [x] Clippy: no warnings
   - [x] Spotless formatting passes
   - [x] `ParquetReadV1Suite` — all tests pass (including schema evolution, 
type widening)
   - [ ] CI: Spark SQL tests with `native_datafusion` should show 5 fewer 
failures
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to