Hi, As [ticket] describes, UDF (scalar, aggregate, window functions) equality/hash_value default implementation is easy to miss and therefore error-prone.
The error-proneness is a risk, which naturally is subjective. As [fix-eq] showed, this risk has materialized many times over even within DataFusion code base, likely leading to query failures and incorrect results (via common subexpression elimination). My assumption is that 3rd party UDF implementations might also be affected in similarly large quantities. I myself became aware of this only after seeing some really bogus query outcomes in a project building on DataFusion. There are two known ways to address this problem 1. fix the default implementation to be safe ([pr-bc]). This has the downside of disabling common subexpression elimination for queries that benefit from it today 2. require explicit implementation, potentially making it very easy to implement it with #derive [derive]. This has the downside of being an API breaking change, requiring addition of these #derive lines. Please leave your thoughts in [ticket] Best, PF [ticket] https://github.com/apache/datafusion/issues/16677 [fix-eq] https://github.com/apache/datafusion/pull/16781 [pr-bc] https://github.com/apache/datafusion/pull/16681 [derive] https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265