sgrebnov opened a new issue, #17305: URL: https://github.com/apache/datafusion/issues/17305
### Describe the bug After upgrading from Datafusion 47 to a newer version I've started seeing schema mismatch errors caused by updated array type coercion logic that does not preserve nullability information for nested types >SELECT offset[2]-offset[1] FROM rd; Arrow error: Invalid argument error: column types must match schema types, expected List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0 ### To Reproduce The following unit test case be used to verify this behavior. >assertion `left == right` failed left: [[List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })]] right: [[List(Field { name: "item", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })]] stack backtrace: ```rust fn test_get_valid_types_fixed_size_arrays() -> Result<()> { let function = "fixed_size_arrays"; let signature = Signature::arrays(2, None, Volatility::Immutable); let data_types = vec![ DataType::new_fixed_size_list(DataType::Int64, 3, false), DataType::new_list(DataType::Int32, false), ]; assert_eq!( get_valid_types(function, &signature.type_signature, &data_types)?, vec![vec![ DataType::new_list(DataType::Int64, false), DataType::new_list(DataType::Int64, false), ]] ); Ok(()) } ``` This can also be observed by adding additional tracing into `coerce_arguments_for_signature_with_scalar_udf`. Observe `data_type: Int32, nullable: false` has changed to `data_type: Int32, nullable: true` in coerced type. ```rust /// Returns `expressions` coerced to types compatible with /// `signature`, if possible. /// /// See the module level documentation for more detail on coercion. fn coerce_arguments_for_signature_with_scalar_udf( expressions: Vec<Expr>, schema: &DFSchema, func: &ScalarUDF, ) -> Result<Vec<Expr>> { if expressions.is_empty() { return Ok(expressions); } let current_types = expressions .iter() .map(|e| e.get_type(schema)) .collect::<Result<Vec<_>>>()?; let new_types = data_types_with_scalar_udf(¤t_types, func)?; println!("schema: {:?}", schema); println!("current_types: {:?}", current_types); println!("Coerced types: {:?}", new_types); expressions .into_iter() .enumerate() .map(|(i, expr)| expr.cast_to(&new_types[i], schema)) .collect() } ``` ```console schema: DFSchema { inner: Schema { fields: [Field { name: "offset", data_type: FixedSizeList(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"content_computed_columns": "content_embedding,content_offset"} }, field_qualifiers: [Some(Bare { table: "rd" })], functional_dependencies: FunctionalDependencies { deps: [] } } current_types: [FixedSizeList(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), Int64] Coerced types: [List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64] ``` ### Expected behavior _No response_ ### Additional context The original (correct) behavior was changed by the following improvement: https://github.com/apache/datafusion/pull/15149#discussion_r2296274979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org