andygrove opened a new pull request, #3416: URL: https://github.com/apache/datafusion-comet/pull/3416
## Summary Closes #3311. - Add Spark-compatible schema validation in the native schema adapter, gated behind new config `spark.comet.parquet.schemaValidation.enabled` (default: `true`) - When enabled, the native scan rejects type coercions that Spark's vectorized Parquet reader would reject (TimestampLTZ↔TimestampNTZ, integer/float widening without schema evolution, string→numeric, etc.) - Pass `schema_evolution_enabled` to native side via proto so integer/float widening is allowed when Comet's schema evolution config is enabled - Native exceptions with schema validation errors are wrapped in `SparkException` with compatible error messages - Un-ignore 5 Spark SQL tests that now pass with `native_datafusion` ### Tests un-ignored - `ParquetIOSuite`: "SPARK-35640: read binary as timestamp should throw schema incompatible error" - `ParquetIOSuite`: "SPARK-35640: int as long should throw schema incompatible error" - `ParquetQuerySuite`: "SPARK-36182: can't read TimestampLTZ as TimestampNTZ" - `ParquetQuerySuite`: "row group skipping doesn't overflow when reading into larger type" - `ParquetFilterSuite`: "SPARK-25207: exception when duplicate fields in case-insensitive mode" ### Tests still ignored (can't match Spark exception types from native) - `ParquetSchemaSuite`: "SPARK-45604" and "schema mismatch failure error message" — check for `SchemaColumnConvertNotSupportedException` - `FileBasedDataSourceSuite`: "caseSensitive" — checks for `SparkRuntimeException` with error class - `ParquetQuerySuite`: "SPARK-34212" — different decimal handling ## Test plan - [x] Rust native build passes - [x] Rust unit tests pass (schema_adapter tests) - [x] Clippy: no warnings - [x] Spotless formatting passes - [x] `ParquetReadV1Suite` — all tests pass (including schema evolution, type widening) - [ ] CI: Spark SQL tests with `native_datafusion` should show 5 fewer failures 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
