kosiew opened a new pull request, #20489:
URL: https://github.com/apache/datafusion/pull/20489

   # PR Title
   
   Introduce OwnedCastOptions and OwnedFormatOptions; strengthen CastColumnExpr 
validation and schema-aware construction
   
   ---
   
   ## Which issue does this PR close?
   
   * 
[Comment](https://github.com/apache/datafusion/pull/20202#discussion_r2804851175)
 on #20202
   
   ---
   
   ## Rationale for this change
   
   This PR improves the flexibility and safety of casting behavior in 
DataFusion by:
   
   1. Introducing owned variants of `FormatOptions` and `CastOptions` to allow 
dynamic and runtime-configurable format strings without lifetime constraints.
   2. Strengthening validation logic in `CastColumnExpr` to catch schema 
mismatches, invalid casts, and nullability violations earlier in the planning 
phase.
   3. Making struct and nested field compatibility checks reusable across 
modules.
   
   These changes help prevent subtle runtime errors, improve error messages, 
and make casting behavior more robust and predictable, especially in schema 
adaptation scenarios (e.g., Parquet file schema vs. table schema mismatches).
   
   ---
   
   ## What changes are included in this PR?
   
   ### 1. Owned Format and Cast Options
   
   * Added `OwnedFormatOptions` as an owned version of Arrow's `FormatOptions` 
(using `String` instead of `&str`).
   * Added `OwnedCastOptions` as an owned version of Arrow's `CastOptions`, 
embedding `OwnedFormatOptions`.
   * Implemented conversions:
   
     * `OwnedCastOptions::from_arrow_options`
     * `OwnedCastOptions::as_arrow_options`
     * `OwnedFormatOptions::as_arrow_options`
   * Re-exported `OwnedCastOptions` and `OwnedFormatOptions` from 
`datafusion_common`.
   
   This enables dynamic formatting configuration without requiring `'static` 
lifetimes.
   
   ---
   
   ### 2. CastColumnExpr Refactor and Validation
   
   * Replaced `CastOptions<'static>` with `OwnedCastOptions` in 
`CastColumnExpr`.
   * Added `input_schema` to `CastColumnExpr` to enable schema-aware validation.
   * Introduced:
   
     * `new_with_schema` constructor (returns `Result<Self>`)
     * Internal `build` constructor with centralized validation
   * Added validation helpers:
   
     * Column index bounds checking
     * Expression return type validation
     * Nullability checks (reject nullable → non-nullable casts)
     * Struct compatibility validation via `validate_struct_compatibility`
     * Field compatibility validation via newly public 
`validate_field_compatibility`
   * Improved error reporting using `plan_err!`.
   
   These changes ensure invalid casts are rejected during expression 
construction rather than failing later during evaluation.
   
   ---
   
   ### 3. Nested Struct Validation Improvements
   
   * Made `validate_field_compatibility` public.
   * Reused struct validation logic inside `CastColumnExpr`.
   
   ---
   
   ### 4. Physical Expr Adapter Updates
   
   * Updated schema rewriter to use `CastColumnExpr::new_with_schema`.
   * Adjusted tests to handle fallible constructors.
   
   ---
   
   ### 5. Test Updates and Additions
   
   * Updated existing tests to use the new fallible constructors.
   * Added tests for:
   
     * Rejecting nullable → non-nullable casts
     * Rejecting column index out-of-bounds
   * Updated several Parquet-related tests to align nullability expectations.
   
   ---
   
   ## Are these changes tested?
   
   Yes.
   
   * Existing tests were updated to use the new fallible constructors.
   * New unit tests were added to verify:
   
     * Column index bounds validation
     * Nullable-to-non-nullable cast rejection
     * Struct and nested casting behavior
   * Parquet schema adapter tests were updated to reflect nullability handling 
changes.
   
   These tests help ensure casting correctness, schema safety, and error 
reporting behavior.
   
   ---
   
   ## Are there any user-facing changes?
   
   Yes, but limited:
   
   * Invalid casts (e.g., nullable → non-nullable, incompatible struct casts, 
or out-of-bounds column references) now fail earlier during expression 
construction with clearer error messages.
   * New public APIs:
   
     * `OwnedCastOptions`
     * `OwnedFormatOptions`
     * `validate_field_compatibility`
   
   There are no breaking changes to existing public APIs, but behavior is 
stricter and more defensive. If considered API-impacting, the `api change` 
label may be appropriate due to new exported types and validation semantics.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to