alamb commented on code in PR #10391:
URL: https://github.com/apache/datafusion/pull/10391#discussion_r1591446164


##########
datafusion/expr/src/udaf.rs:
##########
@@ -287,18 +286,23 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
     /// aggregate function was called.
     fn accumulator(&self, acc_args: AccumulatorArgs) -> Result<Box<dyn 
Accumulator>>;
 
+    /// Return the fields for the function
+    fn field(&self, _args: FieldArgs) -> Result<Field> {

Review Comment:
   Is this the same information as `return_type`? If so, perhaps we should 
deprecate the `return_type` function?



##########
datafusion/functions-aggregate/src/covariance.rs:
##########
@@ -93,18 +95,29 @@ impl AggregateUDFImpl for CovarianceSample {
         Ok(DataType::Float64)
     }
 
-    fn state_fields(
-        &self,
-        name: &str,
-        _value_type: DataType,
-        _ordering_fields: Vec<Field>,
-    ) -> Result<Vec<Field>> {
+    fn field(&self, args: datafusion_expr::function::FieldArgs) -> 
Result<Field> {
+        Ok(Field::new(args.name, args.return_type.clone(), true))
+    }
+
+    fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<Field>> {
         Ok(vec![
-            Field::new(format_state_name(name, "count"), DataType::UInt64, 
true),
-            Field::new(format_state_name(name, "mean1"), DataType::Float64, 
true),
-            Field::new(format_state_name(name, "mean2"), DataType::Float64, 
true),
             Field::new(
-                format_state_name(name, "algo_const"),
+                format_state_name(args.name, "count"),
+                DataType::UInt64,
+                args.nullable,
+            ),
+            Field::new(
+                format_state_name(args.name, "mean1"),
+                DataType::Float64,
+                args.nullable,

Review Comment:
   aren't some of these fields nullable even if their input is not nullable 
(e.g. mean with no inputs is null)?



##########
datafusion/expr/src/function.rs:
##########
@@ -84,6 +84,72 @@ impl<'a> AccumulatorArgs<'a> {
     }
 }
 
+/// `StateFieldsArgs` encapsulates details regarding the required state fields 
for an aggregate function.
+///
+/// - `name`: Name of the aggregate function.
+/// - `input_type`: Input type of the aggregate function.
+/// - `return_type`: Return type of the aggregate function. Defined by `fn 
return_type` in AggregateUDFImpl.
+/// - `nullable`: Indicates whether the field can be null.
+pub struct FieldArgs<'a> {
+    pub name: &'a str,
+    pub input_type: &'a DataType,
+    pub return_type: &'a DataType,
+    pub nullable: bool,
+}
+
+impl<'a> FieldArgs<'a> {

Review Comment:
   Given all the fields are pub, I wonder how much benefit this API adds over 
simply creating FieldArgs directly 🤔 



##########
datafusion/expr/src/udaf.rs:
##########
@@ -309,19 +302,10 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
     /// The name of the fields must be unique within the query and thus should
     /// be derived from `name`. See [`format_state_name`] for a utility 
function
     /// to generate a unique name.
-    fn state_fields(
-        &self,
-        name: &str,
-        value_type: DataType,
-        ordering_fields: Vec<Field>,
-    ) -> Result<Vec<Field>> {
-        let value_fields = vec![Field::new(
-            format_state_name(name, "value"),
-            value_type,
-            true,
-        )];
-
-        Ok(value_fields.into_iter().chain(ordering_fields).collect())
+    ///
+    /// [`format_state_name`]: crate::utils::format_state_name
+    fn state_fields(&self, _args: StateFieldsArgs) -> Result<Vec<Field>> {
+        not_impl_err!("state_fields hasn't been implemented for {self:?} yet")

Review Comment:
   This would be an API change -- I think if we want to change it we should 
also update the documentation to reflect the change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to