Dandandan commented on issue #18939: URL: https://github.com/apache/datafusion/issues/18939#issuecomment-3584429177
Another data point is that DuckDB and other engines implement radix hash join and *seem* to have good results doing so, maybe there are newer resources showing this improvement better? TPC-H joins are relatively easy: most joins are on some integer key column without many duplicate values when the build side is the "correct" one. The results from this paper are from TPC-H only (on which we perform already reasonably good for most queries). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
