mwylde opened a new issue, #14701:
URL: https://github.com/apache/datafusion/issues/14701

   The AggregateUDF trait includes a function `fn state_fields(&self, args: 
StateFieldsArgs) -> Result<Vec<Field>>` to get the types for the intermediate 
state of the aggregate. This is useful if we need to store the states, for 
example for multi-level aggregation.
   
   For our use-case we also need to store the accumulator states as part of our 
checkpointing system. This works so long as we're using the standard 
accumulators, but breaks down if you want to use sliding accumulators. This is 
because some aggregates (for example, sum) have different state fields in 
sliding mode (for sum, this is additional "count" field, used to determine when 
we've retracted all of the data).
   
   But there doesn't seem to be any way to determine what the state fields will 
be for a sliding accumulator. A couple of possible options here:
   
   * Follow the pattern of is_distinct, which also can produce different 
accumulators. This is passed in to the state_fields function as a field on the 
StateFieldsArgs struct; we could add a similar one for is_sliding
   * It seems like state_fields is really a property of the accumulator, not of 
the aggregate (as various aggregates may produce different accumulators 
depending on the options and which accumulator function is called), so it might 
be better to have the state_fields function on the accumulator instead of the 
aggregate.
   
   We've gone ahead and implemented the first approach in our fork, but would 
be nice to get something in upstream that addresses this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to