Dandandan commented on PR #16153: URL: https://github.com/apache/datafusion/pull/16153#issuecomment-2906697087
> > This optimization is neat and already covers the common case of joins on primary keys. I think we can further optimize the join hash table - even for cases where _some_ keys might have chains. Instead of looking for a 0 value in the `next` vector, we can encode whether there is a next value in the top bit of the current slot - thus saving a lookup in the `next` vector on every probe that has at least a single match. > > I don't know how well this plays with the streaming join hash map though =) > > That sounds like a neat thing to try! Another (smaller) optimization I can think of is encode hashmap and next list with `u32` indices / offsets if possible (so it fits more easily in CPU cache by halving the data). Filed https://github.com/apache/datafusion/issues/16179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org