sgrebnov opened a new issue, #17305:
URL: https://github.com/apache/datafusion/issues/17305

   ### Describe the bug
   
   After upgrading from Datafusion 47 to a newer version I've started seeing 
schema mismatch errors caused by updated array type coercion logic that does 
not preserve nullability information for nested types
   
   >SELECT offset[2]-offset[1] FROM rd;
   Arrow error: Invalid argument error: column types must match schema types, 
expected List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", 
data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, 
metadata: {} }) at column index 0
   
   
   
   
   
   
   ### To Reproduce
   
   The following unit test case be used to verify this behavior.
   
   >assertion `left == right` failed
     left: [[List(Field { name: "item", data_type: Int64, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", 
data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: 
{} })]]
    right: [[List(Field { name: "item", data_type: Int64, nullable: false, 
dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", 
data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, 
metadata: {} })]]
   stack backtrace:
   
   ```rust
   fn test_get_valid_types_fixed_size_arrays() -> Result<()> {
           let function = "fixed_size_arrays";
           let signature = Signature::arrays(2, None, Volatility::Immutable);
   
           let data_types = vec![
               DataType::new_fixed_size_list(DataType::Int64, 3, false),
               DataType::new_list(DataType::Int32, false),
           ];
           assert_eq!(
               get_valid_types(function, &signature.type_signature, 
&data_types)?,
               vec![vec![
                   DataType::new_list(DataType::Int64, false),
                   DataType::new_list(DataType::Int64, false),
               ]]
           );
   
           Ok(())
       }
   ```
   
   This can also be observed by adding additional tracing into 
`coerce_arguments_for_signature_with_scalar_udf`. Observe `data_type: Int32, 
nullable: false` has changed to `data_type: Int32, nullable: true` in coerced 
type.
   
   ```rust
   /// Returns `expressions` coerced to types compatible with
   /// `signature`, if possible.
   ///
   /// See the module level documentation for more detail on coercion.
   fn coerce_arguments_for_signature_with_scalar_udf(
       expressions: Vec<Expr>,
       schema: &DFSchema,
       func: &ScalarUDF,
   ) -> Result<Vec<Expr>> {
       if expressions.is_empty() {
           return Ok(expressions);
       }
   
       let current_types = expressions
           .iter()
           .map(|e| e.get_type(schema))
           .collect::<Result<Vec<_>>>()?;
   
       let new_types = data_types_with_scalar_udf(&current_types, func)?;
   
       println!("schema: {:?}", schema);
       println!("current_types: {:?}", current_types);
       println!("Coerced types: {:?}", new_types);
   
       expressions
           .into_iter()
           .enumerate()
           .map(|(i, expr)| expr.cast_to(&new_types[i], schema))
           .collect()
   }
   ```
   
   ```console
   schema: DFSchema { inner: Schema { fields: [Field { name: "offset", 
data_type: FixedSizeList(Field { name: "item", data_type: Int32, nullable: 
false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: 
{"content_computed_columns": "content_embedding,content_offset"} }, 
field_qualifiers: [Some(Bare { table: "rd" })], functional_dependencies: 
FunctionalDependencies { deps: [] } }
   
   current_types: [FixedSizeList(Field { name: "item", data_type: Int32, 
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), Int64]
   
   Coerced types: [List(Field { name: "item", data_type: Int32, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64]
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   The original (correct) behavior was changed by the following improvement:
   https://github.com/apache/datafusion/pull/15149#discussion_r2296274979


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to