sgrebnov opened a new issue, #17305:
URL: https://github.com/apache/datafusion/issues/17305
### Describe the bug
After upgrading from Datafusion 47 to a newer version I've started seeing
schema mismatch errors caused by updated array type coercion logic that does
not preserve nullability information for nested types
>SELECT offset[2]-offset[1] FROM rd;
Arrow error: Invalid argument error: column types must match schema types,
expected List(Field { name: "item", data_type: Int32, nullable: true, dict_id:
0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item",
data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }) at column index 0
### To Reproduce
The following unit test case be used to verify this behavior.
>assertion `left == right` failed
left: [[List(Field { name: "item", data_type: Int64, nullable: true,
dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item",
data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata:
{} })]]
right: [[List(Field { name: "item", data_type: Int64, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item",
data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} })]]
stack backtrace:
```rust
fn test_get_valid_types_fixed_size_arrays() -> Result<()> {
let function = "fixed_size_arrays";
let signature = Signature::arrays(2, None, Volatility::Immutable);
let data_types = vec![
DataType::new_fixed_size_list(DataType::Int64, 3, false),
DataType::new_list(DataType::Int32, false),
];
assert_eq!(
get_valid_types(function, &signature.type_signature,
&data_types)?,
vec![vec![
DataType::new_list(DataType::Int64, false),
DataType::new_list(DataType::Int64, false),
]]
);
Ok(())
}
```
This can also be observed by adding additional tracing into
`coerce_arguments_for_signature_with_scalar_udf`. Observe `data_type: Int32,
nullable: false` has changed to `data_type: Int32, nullable: true` in coerced
type.
```rust
/// Returns `expressions` coerced to types compatible with
/// `signature`, if possible.
///
/// See the module level documentation for more detail on coercion.
fn coerce_arguments_for_signature_with_scalar_udf(
expressions: Vec<Expr>,
schema: &DFSchema,
func: &ScalarUDF,
) -> Result<Vec<Expr>> {
if expressions.is_empty() {
return Ok(expressions);
}
let current_types = expressions
.iter()
.map(|e| e.get_type(schema))
.collect::<Result<Vec<_>>>()?;
let new_types = data_types_with_scalar_udf(¤t_types, func)?;
println!("schema: {:?}", schema);
println!("current_types: {:?}", current_types);
println!("Coerced types: {:?}", new_types);
expressions
.into_iter()
.enumerate()
.map(|(i, expr)| expr.cast_to(&new_types[i], schema))
.collect()
}
```
```console
schema: DFSchema { inner: Schema { fields: [Field { name: "offset",
data_type: FixedSizeList(Field { name: "item", data_type: Int32, nullable:
false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), nullable: true,
dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata:
{"content_computed_columns": "content_embedding,content_offset"} },
field_qualifiers: [Some(Bare { table: "rd" })], functional_dependencies:
FunctionalDependencies { deps: [] } }
current_types: [FixedSizeList(Field { name: "item", data_type: Int32,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), Int64]
Coerced types: [List(Field { name: "item", data_type: Int32, nullable: true,
dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64]
```
### Expected behavior
_No response_
### Additional context
The original (correct) behavior was changed by the following improvement:
https://github.com/apache/datafusion/pull/15149#discussion_r2296274979
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]