alamb commented on code in PR #10439:
URL: https://github.com/apache/datafusion/pull/10439#discussion_r1597109610


##########
datafusion/functions-array/src/make_array.rs:
##########
@@ -111,6 +111,25 @@ impl ScalarUDFImpl for MakeArray {
     fn aliases(&self) -> &[String] {
         &self.aliases
     }
+
+    fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {

Review Comment:
   love it!



##########
datafusion/optimizer/src/analyzer/type_coercion.rs:
##########
@@ -914,7 +942,7 @@ mod test {
             .err()
             .unwrap();
         assert_eq!(
-            "type_coercion\ncaused by\nError during planning: Coercion from 
[Utf8] to the signature Uniform(1, [Float64]) failed.",
+            "type_coercion\ncaused by\nError during planning: 
[data_types_with_aggregate_udf] Coercion from [Utf8] to the signature 
Uniform(1, [Float64]) failed.",

Review Comment:
   Maybe we could make this error say "User defined coercsion rule failed" so 
it was clearer what happened



##########
datafusion/expr/src/signature.rs:
##########
@@ -91,15 +91,10 @@ pub enum TypeSignature {
     /// # Examples
     /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
     Variadic(Vec<DataType>),
-    /// One or more arguments of an arbitrary but equal type.
-    /// DataFusion attempts to coerce all argument types to match the first 
argument's type
-    ///
-    /// # Examples
-    /// Given types in signature should be coercible to the same final type.
-    /// A function such as `make_array` is `VariadicEqual`.
-    ///
-    /// `make_array(i32, i64) -> make_array(i64, i64)`
-    VariadicEqual,
+    /// One or more arguments of an arbitrary type and coerced with 
user-defined coercion rules.

Review Comment:
   Since both `TypeSignature::VariadicCoercion` and 
`TypeCoercion::UniformCoercion` call into `ScalarUDFImpl::coerce_types` I 
wonder if there is a reason to have both.
   
   Maybe instead we could add a single variant for both cases. Perhaps 
something like: 
   
   ```rust
   pub enum TypeSignature {
   ...
     /// The acceptable signature and coercions rules to coerce arguments to 
this
     /// signature are special for this function. If this signature is 
specified,
     /// Datafusion will call [`ScalarUDFImpl::coerce_types`] to prepare 
argument types.
     FunctionDefined, 
   ...
   }
   ```



##########
datafusion/optimizer/src/analyzer/type_coercion.rs:
##########
@@ -833,12 +865,8 @@ mod test {
             signature: Signature::uniform(1, vec![DataType::Float32], 
Volatility::Stable),
         })
         .call(vec![lit("Apple")]);
-        let plan_err = Projection::try_new(vec![udf], empty)
+        Projection::try_new(vec![udf], empty)

Review Comment:
   As long as the message is tested elsewhere I think this is fine



##########
datafusion/sqllogictest/test_files/coalesce.slt:
##########
@@ -23,7 +23,7 @@ select coalesce(1, 2, 3);
 1
 
 # test with first null
-query IT
+query ?T

Review Comment:
   I would still prefer to make the change to the coalsce coercion rules as its 
own PR, but I am also fine if we include it in this PR



##########
datafusion/expr/src/udf.rs:
##########
@@ -420,6 +425,11 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
     fn short_circuits(&self) -> bool {
         false
     }
+
+    /// Coerce the types of the arguments to the types that the function 
expects

Review Comment:
   Nice -- I think it would be nice to expand this documentation to help people 
understand the API some more
   
   Perhaps something like 
   
   ```suggestion
       /// Coerce arguments of a function call to types that the function can 
evaluate.
       ///
       /// This function is only called if [`Self::signature`] returns 
[`TypeSignature::UserDefined`]. Most
       /// UDFs should return one of the other variants of `TypeSignature` 
which handle common
       /// cases
       ///
       /// See the [type coercion module](datafusion_expr::type_coercion)
       /// documentation for more details on type coercion
       ///
       /// For example, if your function requires a floating point arguments, 
but the user calls
       /// it like `my_func(1::int)` (aka with `1` as an integer), coerce_types 
could return `[DataType::Float64]`
       /// to ensure the argument was cast to `1::double`
       ///
       /// # Parameters
       /// * `arg_types`: The argument types of the arguments  this function 
with
       ///
       /// # Return value
       /// A Vec the same length as `arg_types`. DataFusion will `CAST` the 
function call
       /// arguments to these specific types. 
   ```



##########
datafusion/expr/src/type_coercion/functions.rs:
##########
@@ -184,32 +345,14 @@ fn get_valid_types(
             .iter()
             .map(|valid_type| (0..*number).map(|_| 
valid_type.clone()).collect())
             .collect(),
-        TypeSignature::VariadicEqual => {
-            let new_type = current_types.iter().skip(1).try_fold(
-                current_types.first().unwrap().clone(),
-                |acc, x| {
-                    // The coerced types found by `comparison_coercion` are 
not guaranteed to be
-                    // coercible for the arguments. `comparison_coercion` 
returns more loose
-                    // types that can be coerced to both `acc` and `x` for 
comparison purpose.
-                    // See `maybe_data_types` for the actual coercion.
-                    let coerced_type = comparison_coercion(&acc, x);
-                    if let Some(coerced_type) = coerced_type {
-                        Ok(coerced_type)
-                    } else {
-                        internal_err!("Coercion from {acc:?} to {x:?} failed.")
-                    }
-                },
-            );
-
-            match new_type {
-                Ok(new_type) => vec![vec![new_type; current_types.len()]],
-                Err(e) => return Err(e),
-            }
+        TypeSignature::VariadicCoercion | TypeSignature::UniformCoercion(_) => 
{
+            return internal_err!(
+                "Coercion signature is handled in function-specific 
get_valid_types."

Review Comment:
   ```suggestion
                   "Coercion signature should be handled by in 
function-specific coerce_types."
   ```



##########
datafusion/functions/src/core/nvl2.rs:
##########
@@ -55,22 +58,31 @@ impl ScalarUDFImpl for NVL2Func {
     }
 
     fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
-        if arg_types.len() != 3 {
-            return Err(plan_datafusion_err!(
-                "{}",
-                utils::generate_signature_error_msg(
-                    self.name(),
-                    self.signature().clone(),
-                    arg_types,
-                )
-            ));
-        }
         Ok(arg_types[1].clone())
     }
 
     fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
         nvl2_func(args)
     }
+
+    fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
+        let new_type = arg_types.iter().skip(1).try_fold(

Review Comment:
   this pattern to coerce all arguments to the type of the first one is 
repeated several times -- maybe we could make some sort of utility function to 
do it
   
   As a follow on PR perhaps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to