berkaysynnada commented on issue #14287:
URL: https://github.com/apache/datafusion/issues/14287#issuecomment-2614018742

   We have designed a poll-based repartition mechanism that polls its input 
whenever any of the output partitions are polled. This approach deviates from 
the round-robin pattern, and instead ensures a truly even workload distribution 
for consumer partitions. A batch is sent to the partition that has completed 
its computation and is ready to process the next data.
   
   This mechanism also exhibits prefetching behavior, similar to 
SortPreservingMerge, although the prefetching is limited to a single batch (or 
potentially up to the number of partitions—this will be finalized based on 
benchmark results).
   
   The implementation is currently underway, and the initial benchmark results 
are very promising. Theoretically, this approach should perform better 
especially in scenarios where the producer pace is higher than consumer side, 
which is the case I believe @westonpace mentions in the issue description.
   
   @Weijun-H is working on the implementation, and I hope we open the PR in the 
coming weeks once it is in a robust and optimized state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to