alamb commented on PR #21817: URL: https://github.com/apache/datafusion/pull/21817#issuecomment-4652021544
> My preferred long-term solution would be to implement a hash table specialized for semi/anti joins. I believe that would be not only faster, but also more general, since the optimization could apply to all data types. > > One related idea is to fully separate the semi/anti join path from the existing hash join implementation. I think this would make both paths more organized and potentially more performant: see a related issue https://github.com/apache/datafusion/issues/22710 I think @neilconway did some similar work here (to optimize SEMI joins) - https://github.com/apache/datafusion/pull/22794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
