Re: [PR] feat: support lambda function for scalar udf [datafusion]

via GitHub Mon, 25 Aug 2025 03:42:35 -0700


chenkovsky commented on code in PR #17220:
URL: https://github.com/apache/datafusion/pull/17220#discussion_r2297761959



##########
datafusion/expr/src/udf.rs:
##########
@@ -714,6 +774,58 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send + 
Sync {
     fn documentation(&self) -> Option<&Documentation> {
         None
     }
+
+    /// Attempts to optimize or transform the function call.
+    ///
+    /// This method allows UDF implementations to provide optimized versions
+    /// of function calls or transform them into different expressions.
+    /// Returns `None` if no optimization is available.
+    ///
+    /// # Arguments
+    /// * `_args` - The function arguments to potentially optimize
+    ///
+    /// # Returns
+    /// An optional optimized expression, or None if no optimization is 
available
+    fn try_call(&self, _args: &[Expr]) -> Result<Option<Expr>> {
+        Ok(None)
+    }
+
+    /// Plans the scalar UDF implementation with lambda function support.
+    ///
+    /// This method enables UDF implementations to work with lambda functions
+    /// by allowing them to plan and prepare lambda expressions for execution.
+    /// Returns a new implementation instance if lambda planning is needed.
+    ///
+    /// # Arguments
+    /// * `_planner` - The lambda planner for converting logical lambdas to 
physical
+    /// * `_args` - The function arguments that may include lambda expressions
+    /// * `_input_dfschema` - The input schema context for lambda planning
+    ///
+    /// # Returns
+    /// An optional new UDF implementation with planned lambdas, or None if no 
planning is needed
+    fn plan(
+        &self,
+        _planner: &dyn LambdaPlanner,
+        _args: &[Expr],
+        _input_dfschema: &DFSchema,
+    ) -> Result<Option<Arc<dyn ScalarUDFImpl>>> {
+        Ok(None)
+    }

Review Comment:
   I considered your solutions before. but they all require a significant 
change. this solution is not the perfect one. But I think it's the least 
modification one. 
   
   Lambda functions are just a seasoning; although they are indispensable, most 
UDFs do not require them. For example, in Databricks, only the following 
functions use lambda functions:
   
aggregate,array_sort,exists,filter,forall,map_filter,map_zip_with,transform,transform_keys,transform_values,zip_with.
   Therefore, I feel there is no need for us to make huge changes just because 
of this. That's why I selected the least modification one. 
   
   
   > It requires users to call this function beforehand for the higher-order 
function to actually work. 
   
   it's  a currying. from my side, this is not hard to understand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: support lambda function for scalar udf [datafusion]

Reply via email to