MrPowers opened a new issue, #16710:
URL: https://github.com/apache/datafusion/issues/16710

   DataFusion is underperforming the Polars streaming engine on some localhost 
join queries (1e8 rows of data on a Macbook M3 with 16GB of RAM):
   
   <img width="640" height="480" alt="Image" 
src="https://github.com/user-attachments/assets/045061e2-4ac5-4436-8d01-009dbb69ea41";
 />
   
   Here are the [join 
queries](https://github.com/apache/datafusion/blob/main/benchmarks/queries/h2o/join.sql).
   
   I am guessing the join operator can be optimized, similar to how the 
filtering and aggregation operations were optimized.
   
   Here is an example of how the median function was made faster: 
https://github.com/apache/datafusion/issues/13550
   
   See this epic for more info: 
https://github.com/apache/datafusion/issues/13548


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to