kosiew opened a new pull request, #20489: URL: https://github.com/apache/datafusion/pull/20489
# PR Title Introduce OwnedCastOptions and OwnedFormatOptions; strengthen CastColumnExpr validation and schema-aware construction --- ## Which issue does this PR close? * [Comment](https://github.com/apache/datafusion/pull/20202#discussion_r2804851175) on #20202 --- ## Rationale for this change This PR improves the flexibility and safety of casting behavior in DataFusion by: 1. Introducing owned variants of `FormatOptions` and `CastOptions` to allow dynamic and runtime-configurable format strings without lifetime constraints. 2. Strengthening validation logic in `CastColumnExpr` to catch schema mismatches, invalid casts, and nullability violations earlier in the planning phase. 3. Making struct and nested field compatibility checks reusable across modules. These changes help prevent subtle runtime errors, improve error messages, and make casting behavior more robust and predictable, especially in schema adaptation scenarios (e.g., Parquet file schema vs. table schema mismatches). --- ## What changes are included in this PR? ### 1. Owned Format and Cast Options * Added `OwnedFormatOptions` as an owned version of Arrow's `FormatOptions` (using `String` instead of `&str`). * Added `OwnedCastOptions` as an owned version of Arrow's `CastOptions`, embedding `OwnedFormatOptions`. * Implemented conversions: * `OwnedCastOptions::from_arrow_options` * `OwnedCastOptions::as_arrow_options` * `OwnedFormatOptions::as_arrow_options` * Re-exported `OwnedCastOptions` and `OwnedFormatOptions` from `datafusion_common`. This enables dynamic formatting configuration without requiring `'static` lifetimes. --- ### 2. CastColumnExpr Refactor and Validation * Replaced `CastOptions<'static>` with `OwnedCastOptions` in `CastColumnExpr`. * Added `input_schema` to `CastColumnExpr` to enable schema-aware validation. * Introduced: * `new_with_schema` constructor (returns `Result<Self>`) * Internal `build` constructor with centralized validation * Added validation helpers: * Column index bounds checking * Expression return type validation * Nullability checks (reject nullable → non-nullable casts) * Struct compatibility validation via `validate_struct_compatibility` * Field compatibility validation via newly public `validate_field_compatibility` * Improved error reporting using `plan_err!`. These changes ensure invalid casts are rejected during expression construction rather than failing later during evaluation. --- ### 3. Nested Struct Validation Improvements * Made `validate_field_compatibility` public. * Reused struct validation logic inside `CastColumnExpr`. --- ### 4. Physical Expr Adapter Updates * Updated schema rewriter to use `CastColumnExpr::new_with_schema`. * Adjusted tests to handle fallible constructors. --- ### 5. Test Updates and Additions * Updated existing tests to use the new fallible constructors. * Added tests for: * Rejecting nullable → non-nullable casts * Rejecting column index out-of-bounds * Updated several Parquet-related tests to align nullability expectations. --- ## Are these changes tested? Yes. * Existing tests were updated to use the new fallible constructors. * New unit tests were added to verify: * Column index bounds validation * Nullable-to-non-nullable cast rejection * Struct and nested casting behavior * Parquet schema adapter tests were updated to reflect nullability handling changes. These tests help ensure casting correctness, schema safety, and error reporting behavior. --- ## Are there any user-facing changes? Yes, but limited: * Invalid casts (e.g., nullable → non-nullable, incompatible struct casts, or out-of-bounds column references) now fail earlier during expression construction with clearer error messages. * New public APIs: * `OwnedCastOptions` * `OwnedFormatOptions` * `validate_field_compatibility` There are no breaking changes to existing public APIs, but behavior is stricter and more defensive. If considered API-impacting, the `api change` label may be appropriate due to new exported types and validation semantics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
