jkosh44 opened a new issue, #14451:
URL: https://github.com/apache/datafusion/issues/14451

   ### Describe the bug
   
   The following array functions do not properly handle NULL input, they either 
return an error or panic (non-exhaustive):
   
   - `array_sort`
   - `array_replace`
   - `array_replace_all`
   - `array_replace_n`
   - `array_replace`
   
   Discovered in https://github.com/apache/datafusion/pull/14289
   
   ### To Reproduce
   
   ## Scalar input
   
   ```
   > SELECT array_sort([1,2,3], NULL);
   Internal error: could not cast value to 
arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker
   
   > SELECT array_resize([1,2,3], NULL, 0);
   Internal error: could not cast value to 
arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>.
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker
   
   > SELECT array_replace([1,2,3], NULL, NULL);
   thread 'main' panicked at 
/home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13:
   assertion `left == right` failed: Arrays with inconsistent types passed to 
MutableArrayData
     left: Int64
    right: Null
   stack backtrace:
      0: rust_begin_unwind
                at 
/rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
      1: core::panicking::panic_fmt
                at 
/rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
      2: core::panicking::assert_failed_inner
      3: core::panicking::assert_failed
                at 
/home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5
      4: arrow_data::transform::MutableArrayData::with_capacities
                at 
/home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13
      5: datafusion_functions_nested::replace::general_replace
                at 
/home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:303:23
      6: datafusion_functions_nested::replace::array_replace_inner
                at 
/home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:394:13
      7: core::ops::function::Fn::call
                at 
/home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:79:5
      8: datafusion_functions_nested::utils::make_scalar_function::{{closure}}
                at 
/home/joe/Projects/datafusion/datafusion/functions-nested/src/utils.rs:72:22
      9: <datafusion_functions_nested::replace::ArrayReplace as 
datafusion_expr::udf::ScalarUDFImpl>::invoke_batch
                at 
/home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:123:9
     10: datafusion_expr::udf::ScalarUDFImpl::invoke_with_args
                at 
/home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:643:9
     11: datafusion_expr::udf::ScalarUDF::invoke_with_args
                at 
/home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:237:9
     12: <datafusion_physical_expr::scalar_function::ScalarFunctionExpr as 
datafusion_physical_expr_common::physical_expr::PhysicalExpr>::evaluate
                at 
/home/joe/Projects/datafusion/datafusion/physical-expr/src/scalar_function.rs:195:22
     13: 
datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator::evaluate_to_scalar
                at 
/home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:639:29
     14: 
<datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator as 
datafusion_common::tree_node::TreeNodeRewriter>::f_up
                at 
/home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:529:30
     15: 
datafusion_common::tree_node::TreeNode::rewrite::{{closure}}::{{closure}}
                at 
/home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:183:13
     16: datafusion_common::tree_node::Transformed<T>::transform_parent
                at 
/home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:763:44
     17: datafusion_common::tree_node::TreeNode::rewrite::{{closure}}
                at 
/home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:28:9
     18: stacker::maybe_grow
                at 
/home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.17/src/lib.rs:55:9
     19: datafusion_common::tree_node::TreeNode::rewrite
                at 
/home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:177:50
     20: 
datafusion_optimizer::simplify_expressions::expr_simplifier::ExprSimplifier<S>::simplify_with_cycle_count
                at 
/home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:211:17
     ...
     71: tokio::runtime::runtime::Runtime::block_on
                at 
/home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:340:13
     72: datafusion_cli::main
                at ./src/main.rs:131:5
     73: core::ops::function::FnOnce::call_once
                at 
/home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
backtrace.
   killed
   
   Process finished with exit code 9
   ```
   
   ## Array Input
   ```
   > CREATE TABLE t (a int[], b int, c int);
   0 row(s) fetched. 
   Elapsed 0.004 seconds.
   
   > INSERT INTO t VALUES ([1,2,3], 1, 2), (NULL, 1, 2), ([1,2,3], NULL, 2), 
([1,2,3], 1, NULL);
   +-------+
   | count |
   +-------+
   | 4     |
   +-------+
   1 row(s) fetched. 
   Elapsed 0.006 seconds.
   
   > SELECT array_sort(a, b, c) FROM t;
   Internal error: could not cast value to 
arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker
   > SELECT array_resize(a, b, c) FROM t;
   Internal error: could not cast value to 
arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>.
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker
   ```
   
   ### Expected behavior
   
   The functions should either return NULL or correctly handle the NULL input 
without panicking.
   
   ### Additional context
   
   I briefly investigated the functions and found the following:
   
   ## `array_sort`
   
   To fix this function we can add `NullHandling::Propagate` to the`impl 
ScalarUDFImpl for ArraySort`. Then we also have to update `array_sort_inner` to 
return NULL on NULL inputs.
   
   ## `array_resize`
   
   I haven't looked into this one yet, but I think the solution is probably 
similar to `array_sort`.
   
   ## `array_replace`
   
   Note this applies to `array_replace_all` and `array_replace_n`.
   
   This function actually does not want to propagate NULLs, it should be able 
to find NULL elements and replace with NULL elements. When the inputs are 
`ArrayRef`s then actually everything works correctly, it's only when the inputs 
are `Scalar` that things break. The issue is that the functions accepts 
`DataType::Null` instead of a null value for the list element type. Then we get 
type errors when trying to use the `DataType::Null` with a typed list.
   
   I believe if we updated the signature from `Signature::any(3, 
Volatility::Immutable)` to something that includes type information then we 
would fix the panic. However, we face a familiar problem that there is no way 
to represent `(list, element-type, element-type, i64)` with the current 
`Signature` struct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to