kumarUjjawal commented on PR #20154:
URL: https://github.com/apache/datafusion/pull/20154#issuecomment-3957451327

   > Also, for the inconsistencies, can you describe what is "fixed" by 
resolving them. Is there a potential wrong results bug? Something else? If it 
is wrong results, I wonder why we haven't seen it before
   
   As per my understanding The old nested-type hashers always called 
combine_hashes(*hash, value), they always behaved as rehash = true, even when 
called as the first column (where hashes_buffer was pre-zeroed to 0).
   
   So they computed combine_hashes(0, inner_hash) instead of inner_hash 
directly. Results were internally consistent within a version. Joins and 
partitioning worked correctly.
   
   What this PR fixes is purely semantic inconsistency and clarity (per the 
original issue). Nested types always behaved as rehash = true regardless of 
context, while primitive and dictionary types correctly skipped combine_hashes 
on the first column.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to