shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2690645120
We were able to work around the `count(*)` regression in Sail by doing the following when constructing an `Expr::AggregateFunction` for `count_udaf`: https://github.com/lakehq/sail/blob/4fc24cb4162a153eaeaa9d6344ec1d0e97d7e693/crates/sail-plan/src/function/aggregate.rs#L190-L198 ``` let args = match arguments.as_slice() { [expr::Expr::Wildcard { qualifier: None, options: _, }] => { vec![expr::Expr::Literal(COUNT_STAR_EXPANSION)] } _ => arguments, }; ``` We assume that the aggregate function will receive wildcard expressions without nesting. Otherwise, the input is invalid (e.g., `count(* AS foo)`). Other projects adopting this workaround need to be aware of the implications for their use cases and determine whether they can make the same assumption. The only other regression is in the `expr_fn::substr(string, position)` UDF. ### Thoughts If `Expr::Wildcard` no longer works, why is it still being kept around? There is a concerning pattern of various zombie operations that no longer work, are surfaced in pre-release testing, and yet are still exposed to downstream users. In the current and previous pre-release testing, we have surfaced the following zombie operations: - `Expr::Wildcard { qualifier: None, .. }` - `ScalarUDFImpl::invoke()` - `ScalarUDFImpl::invoke_batch()` - `ScalarUDFImpl::is_nullable()` (previous release testing) If these operations no longer serve a purpose, why don't we either remove them entirely as proper breaking changes or modify them to throw an error upon construction? Alternatively, if they are no longer relevant for downstream users, why not restrict their exposure to internal use within DataFusion when applicable? Otherwise, we are knowingly releasing bugs and breaks, which defeats the whole purpose of pre-release testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org