alamb commented on code in PR #14094: URL: https://github.com/apache/datafusion/pull/14094#discussion_r1920358041
########## datafusion/expr/src/udf.rs: ########## @@ -342,6 +348,56 @@ pub struct ScalarFunctionArgs<'a> { pub return_type: &'a DataType, } +#[derive(Debug)] +pub struct ReturnTypeArgs<'a> { Review Comment: Here are some suggestions to improve the comments: ```suggestion /// Arguments passed to [`ScalarUDFImpl::return_type_from_args`] #[derive(Debug)] pub struct ReturnTypeArgs<'a> { ``` ########## datafusion/expr/src/udf.rs: ########## @@ -342,6 +348,56 @@ pub struct ScalarFunctionArgs<'a> { pub return_type: &'a DataType, } +#[derive(Debug)] +pub struct ReturnTypeArgs<'a> { + /// The data types of the arguments to the function + pub arg_types: &'a [DataType], + /// The Utf8 arguments to the function, if the expression is not Utf8, it will be empty string Review Comment: If possible I suggest using the following type: 1. It can distinguish between string/non-string arguments 3. It avoids clones ```rust pub arguments: &'a [Option<&'a ScalarValue>]>`, ``` ########## datafusion/expr/src/udf.rs: ########## @@ -490,6 +547,40 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { self.return_type(arg_types) } + /// What [`DataType`] will be returned by this function, given the Review Comment: I also recommend removing the comments from `return_type_from_exprs` now that it is deprecated and you have moved the content here ########## datafusion/expr/src/udf.rs: ########## @@ -481,6 +537,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { /// This function must consistently return the same type for the same /// logical input even if the input is simplified (e.g. it must return the same /// value for `('foo' | 'bar')` as it does for ('foobar'). + #[deprecated(since = "45.0.0", note = "Use `return_type_from_args` instead")] Review Comment: Can you also please update the comments on `return_type` to refer to `return_type_from_args` (it currently talks about `return_type_from_exprs`: ```rust /// If you provide an implementation for [`Self::return_type_from_exprs`], /// DataFusion will not call `return_type` (this function). In this case it /// is recommended to return [`DataFusionError::Internal`]. ``` ########## datafusion/functions/src/core/named_struct.rs: ########## @@ -44,11 +44,18 @@ fn named_struct_expr(args: &[ColumnarValue]) -> Result<ColumnarValue> { .chunks_exact(2) .enumerate() .map(|(i, chunk)| { - let name_column = &chunk[0]; let name = match name_column { - ColumnarValue::Scalar(ScalarValue::Utf8(Some(name_scalar))) => name_scalar, - _ => return exec_err!("named_struct even arguments must be string literals, got {name_column:?} instead at position {}", i * 2) + ColumnarValue::Scalar(ScalarValue::Utf8(Some(name_scalar))) => { + name_scalar + } + // TODO: Implement Display for ColumnarValue Review Comment: That is a good todo -- maybe we can make a follow on ticket for it ########## datafusion/functions/src/core/coalesce.rs: ########## @@ -174,39 +177,3 @@ fn get_coalesce_doc() -> &'static Documentation { .build() }) } - -#[cfg(test)] Review Comment: why remove these tests? ########## datafusion/physical-expr/src/scalar_function.rs: ########## @@ -83,6 +86,53 @@ impl ScalarFunctionExpr { } } + /// Create a new Scalar function + pub fn try_new( + fun: Arc<ScalarUDF>, + args: Vec<Arc<dyn PhysicalExpr>>, + schema: &Schema, + ) -> Result<Self> { + let name = fun.name().to_string(); + let arg_types = args + .iter() + .map(|e| e.data_type(schema)) + .collect::<Result<Vec<_>>>()?; + + // verify that input data types is consistent with function's `TypeSignature` + data_types_with_scalar_udf(&arg_types, &fun)?; + + let nullables = args + .iter() + .map(|e| e.nullable(schema)) + .collect::<Result<Vec<_>>>()?; + + let arguments: Vec<String> = args + .iter() + .map(|e| { + e.as_any() + .downcast_ref::<Literal>() + .map(|literal| match literal.value() { + ScalarValue::Utf8(Some(s)) => s.clone(), Review Comment: It would be really nice to avoid this clone here -- if we changed the signature to `&[Option<&str>]` I think that would be easier to use and more efficient -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org