MrPowers opened a new issue, #16710: URL: https://github.com/apache/datafusion/issues/16710
DataFusion is underperforming the Polars streaming engine on some localhost join queries (1e8 rows of data on a Macbook M3 with 16GB of RAM): <img width="640" height="480" alt="Image" src="https://github.com/user-attachments/assets/045061e2-4ac5-4436-8d01-009dbb69ea41" /> Here are the [join queries](https://github.com/apache/datafusion/blob/main/benchmarks/queries/h2o/join.sql). I am guessing the join operator can be optimized, similar to how the filtering and aggregation operations were optimized. Here is an example of how the median function was made faster: https://github.com/apache/datafusion/issues/13550 See this epic for more info: https://github.com/apache/datafusion/issues/13548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org