mwylde opened a new issue, #14701: URL: https://github.com/apache/datafusion/issues/14701
The AggregateUDF trait includes a function `fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<Field>>` to get the types for the intermediate state of the aggregate. This is useful if we need to store the states, for example for multi-level aggregation. For our use-case we also need to store the accumulator states as part of our checkpointing system. This works so long as we're using the standard accumulators, but breaks down if you want to use sliding accumulators. This is because some aggregates (for example, sum) have different state fields in sliding mode (for sum, this is additional "count" field, used to determine when we've retracted all of the data). But there doesn't seem to be any way to determine what the state fields will be for a sliding accumulator. A couple of possible options here: * Follow the pattern of is_distinct, which also can produce different accumulators. This is passed in to the state_fields function as a field on the StateFieldsArgs struct; we could add a similar one for is_sliding * It seems like state_fields is really a property of the accumulator, not of the aggregate (as various aggregates may produce different accumulators depending on the options and which accumulator function is called), so it might be better to have the state_fields function on the accumulator instead of the aggregate. We've gone ahead and implemented the first approach in our fork, but would be nice to get something in upstream that addresses this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org