Hi, Following-up here, https://github.com/apache/datafusion/pull/17081 introduced proposed changes. It covers window functions only for now, others will follow
Best PF On Wed, 23 Jul 2025 at 17:39, Piotr Findeisen <[email protected]> wrote: > Hi, > > As [ticket] describes, UDF (scalar, aggregate, window > functions) equality/hash_value default implementation is easy to miss and > therefore error-prone. > > The error-proneness is a risk, which naturally is subjective. As [fix-eq] > showed, this risk has materialized many times over even within DataFusion > code base, likely leading to query failures and incorrect results (via > common subexpression elimination). My assumption is that 3rd party UDF > implementations might also be affected in similarly large quantities. I > myself became aware of this only after seeing some really bogus query > outcomes in a project building on DataFusion. > > There are two known ways to address this problem > > 1. fix the default implementation to be safe ([pr-bc]). This has the > downside of disabling common subexpression elimination for queries that > benefit from it today > > 2. require explicit implementation, potentially making it very easy to > implement it with #derive [derive]. This has the downside of being an API > breaking change, requiring addition of these #derive lines. > > Please leave your thoughts in [ticket] > > Best, > PF > > > [ticket] https://github.com/apache/datafusion/issues/16677 > [fix-eq] https://github.com/apache/datafusion/pull/16781 > [pr-bc] https://github.com/apache/datafusion/pull/16681 > [derive] > https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 > > >
