kumarUjjawal commented on PR #20154: URL: https://github.com/apache/datafusion/pull/20154#issuecomment-3957451327
> Also, for the inconsistencies, can you describe what is "fixed" by resolving them. Is there a potential wrong results bug? Something else? If it is wrong results, I wonder why we haven't seen it before As per my understanding The old nested-type hashers always called combine_hashes(*hash, value), they always behaved as rehash = true, even when called as the first column (where hashes_buffer was pre-zeroed to 0). So they computed combine_hashes(0, inner_hash) instead of inner_hash directly. Results were internally consistent within a version. Joins and partitioning worked correctly. What this PR fixes is purely semantic inconsistency and clarity (per the original issue). Nested types always behaved as rehash = true regardless of context, while primitive and dictionary types correctly skipped combine_hashes on the first column. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
