alamb commented on PR #11261:
URL: https://github.com/apache/datafusion/pull/11261#issuecomment-2211137678
Thank you @Rachelint ❤️
> It is mainly due to AggregateUDFImpl is defined in expr crate, and the
needed PhysicalExpr is defined in datafusion-physical-expr-common crate which
will refer exps a dependency. And if we import PhysicalExpr in expr, that will
lead to cyclic dependecies...
This makes sense -- I agree the ` AggregateUDFImpl` API should not be in
terms of `PhysicalExpr` (for the dependency reasons you describe)
One way I could think to get around this would be to figure out the column
statistics in the physical planner and pass it directly (rather than making the
UDF figure them out). Something like
```rust
pub trait AggregateUDFImpl {
...
/// If the output of this aggregate function can be determined
/// only from input statistics (e.g. COUNT(..) with 0 rows, is always 0)
/// return that value
///
/// # Arguments
/// stats: Overall statistics for the input (including row count)
/// arg_stats: Statistics, if known, for each input argument
fn output_from_stats(
&self,
stats: &Statistics,
arg_stats: &[Option<&ColumnStatistics>]
) -> Option<ScalarValue> {
None
}
...
}
```
ANother option might be to use the narrower API described on
https://github.com/apache/datafusion/issues/11153
```rust
impl AggregateExpr {
/// Return the value of the aggregate function, if known, given the number
of input rows.
///
/// Return None if the value can not be determined solely from the input.
///
/// # Examples
/// * The `COUNT` aggregate would return `Some(11)` given `num_rows = 11`
/// * The `MIN` aggregate would return `Some(Null)` given `num_rows = 0
/// * The `MIN` aggregate would return `None` given num_rows = 11
fn output_from_rows(&self, num_rows: usize) -> Option<ScalarValue> { None }
...
}
```
Though I think your idea is better (pass in statistics in generla)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]