On Fri, Jun 16, 2023, at 17:42, Joel Jacobson wrote: > I realise int4hashset_hash() is broken, > since two int4hashset's that are considered equal, > can by coincidence get different hashes: ... > Do we have any ideas on how to fix this without sacrificing performance?
The problem was due to hashset_hash() function accumulating the hashes of individual elements in a non-commutative manner. As a consequence, the final hash value was sensitive to the order in which elements were inserted into the hashset. This behavior led to inconsistencies, as logically equivalent sets (i.e., sets with the same elements but in different orders) produced different hash values. Solved by modifying the hashset_hash() function to use a commutative operation when combining the hashes of individual elements. This change ensures that the final hash value is independent of the element insertion order, and logically equivalent sets produce the same hash. An somewhat unfortunate side-effect of this fix, is that we can no longer visually sort the hashset output format, since it's not lexicographically sorted. I think this is an acceptable trade-off for a hashset type, since the only alternative I see would be to sort the elements, but then it wouldn't be a hashset, but a treeset, which different Big-O complexity. New patch is attached, which will henceforth always be a complete patch, to avoid the hassle of having to assemble incremental patches. /Joel
hashset-0.0.1-8da4aa8.patch
Description: Binary data