jkosh44 opened a new issue, #14451: URL: https://github.com/apache/datafusion/issues/14451
### Describe the bug The following array functions do not properly handle NULL input, they either return an error or panic (non-exhaustive): - `array_sort` - `array_replace` - `array_replace_all` - `array_replace_n` - `array_replace` Discovered in https://github.com/apache/datafusion/pull/14289 ### To Reproduce ## Scalar input ``` > SELECT array_sort([1,2,3], NULL); Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker > SELECT array_resize([1,2,3], NULL, 0); Internal error: could not cast value to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker > SELECT array_replace([1,2,3], NULL, NULL); thread 'main' panicked at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13: assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData left: Int64 right: Null stack backtrace: 0: rust_begin_unwind at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5 1: core::panicking::panic_fmt at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14 2: core::panicking::assert_failed_inner 3: core::panicking::assert_failed at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5 4: arrow_data::transform::MutableArrayData::with_capacities at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13 5: datafusion_functions_nested::replace::general_replace at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:303:23 6: datafusion_functions_nested::replace::array_replace_inner at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:394:13 7: core::ops::function::Fn::call at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:79:5 8: datafusion_functions_nested::utils::make_scalar_function::{{closure}} at /home/joe/Projects/datafusion/datafusion/functions-nested/src/utils.rs:72:22 9: <datafusion_functions_nested::replace::ArrayReplace as datafusion_expr::udf::ScalarUDFImpl>::invoke_batch at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:123:9 10: datafusion_expr::udf::ScalarUDFImpl::invoke_with_args at /home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:643:9 11: datafusion_expr::udf::ScalarUDF::invoke_with_args at /home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:237:9 12: <datafusion_physical_expr::scalar_function::ScalarFunctionExpr as datafusion_physical_expr_common::physical_expr::PhysicalExpr>::evaluate at /home/joe/Projects/datafusion/datafusion/physical-expr/src/scalar_function.rs:195:22 13: datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator::evaluate_to_scalar at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:639:29 14: <datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator as datafusion_common::tree_node::TreeNodeRewriter>::f_up at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:529:30 15: datafusion_common::tree_node::TreeNode::rewrite::{{closure}}::{{closure}} at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:183:13 16: datafusion_common::tree_node::Transformed<T>::transform_parent at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:763:44 17: datafusion_common::tree_node::TreeNode::rewrite::{{closure}} at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:28:9 18: stacker::maybe_grow at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.17/src/lib.rs:55:9 19: datafusion_common::tree_node::TreeNode::rewrite at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:177:50 20: datafusion_optimizer::simplify_expressions::expr_simplifier::ExprSimplifier<S>::simplify_with_cycle_count at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:211:17 ... 71: tokio::runtime::runtime::Runtime::block_on at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:340:13 72: datafusion_cli::main at ./src/main.rs:131:5 73: core::ops::function::FnOnce::call_once at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. killed Process finished with exit code 9 ``` ## Array Input ``` > CREATE TABLE t (a int[], b int, c int); 0 row(s) fetched. Elapsed 0.004 seconds. > INSERT INTO t VALUES ([1,2,3], 1, 2), (NULL, 1, 2), ([1,2,3], NULL, 2), ([1,2,3], 1, NULL); +-------+ | count | +-------+ | 4 | +-------+ 1 row(s) fetched. Elapsed 0.006 seconds. > SELECT array_sort(a, b, c) FROM t; Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker > SELECT array_resize(a, b, c) FROM t; Internal error: could not cast value to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker ``` ### Expected behavior The functions should either return NULL or correctly handle the NULL input without panicking. ### Additional context I briefly investigated the functions and found the following: ## `array_sort` To fix this function we can add `NullHandling::Propagate` to the`impl ScalarUDFImpl for ArraySort`. Then we also have to update `array_sort_inner` to return NULL on NULL inputs. ## `array_resize` I haven't looked into this one yet, but I think the solution is probably similar to `array_sort`. ## `array_replace` Note this applies to `array_replace_all` and `array_replace_n`. This function actually does not want to propagate NULLs, it should be able to find NULL elements and replace with NULL elements. When the inputs are `ArrayRef`s then actually everything works correctly, it's only when the inputs are `Scalar` that things break. The issue is that the functions accepts `DataType::Null` instead of a null value for the list element type. Then we get type errors when trying to use the `DataType::Null` with a typed list. I believe if we updated the signature from `Signature::any(3, Volatility::Immutable)` to something that includes type information then we would fix the panic. However, we face a familiar problem that there is no way to represent `(list, element-type, element-type, i64)` with the current `Signature` struct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org