Re: [I] Speed up hash partitioning [datafusion]

via GitHub Mon, 24 Mar 2025 03:32:33 -0700


Dandandan commented on issue #6822:
URL: https://github.com/apache/datafusion/issues/6822#issuecomment-2747639530


   I wrote some ideas of supporting selection vectors inside hash join and 
aggregate (I believe we didn't have those issues?)
   
   This seems to be likely to give more substantial gains than trying to 
optimize only the partitioning code only as, even optimized, we still need to 
copy the inputs (and run CoalesceBatchesExec).
   
   https://github.com/apache/datafusion/issues/15382
   https://github.com/apache/datafusion/issues/15383
   
   I think (at least for join, aggregate I am less certain) it might not be too 
hard to implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Speed up hash partitioning [datafusion]

Reply via email to