Hi,

Following-up here, https://github.com/apache/datafusion/pull/17081
introduced proposed changes.
It covers window functions only for now, others will follow

Best
PF



On Wed, 23 Jul 2025 at 17:39, Piotr Findeisen <[email protected]>
wrote:

> Hi,
>
> As [ticket] describes, UDF (scalar, aggregate, window
> functions) equality/hash_value default implementation is easy to miss and
> therefore error-prone.
>
> The error-proneness is a risk, which naturally is subjective. As [fix-eq]
> showed, this risk has materialized many times over even within DataFusion
> code base, likely leading to query failures and incorrect results (via
> common subexpression elimination). My assumption is that 3rd party UDF
> implementations might also be affected in similarly large quantities. I
> myself became aware of this only after seeing some really bogus query
> outcomes in a project building on  DataFusion.
>
> There are two known ways to address this problem
>
> 1. fix the default implementation to be safe ([pr-bc]). This has the
> downside of disabling common subexpression elimination for queries that
> benefit from it today
>
> 2. require explicit implementation, potentially making it very easy to
> implement it with #derive [derive]. This has the downside of being an API
> breaking change, requiring addition of these #derive lines.
>
> Please leave your thoughts in [ticket]
>
> Best,
> PF
>
>
> [ticket] https://github.com/apache/datafusion/issues/16677
> [fix-eq] https://github.com/apache/datafusion/pull/16781
> [pr-bc] https://github.com/apache/datafusion/pull/16681
> [derive]
> https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265
>
>
>

Reply via email to