berkaysynnada commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922064218

   > Thanks @zhuqi-lucas. The problem is clearly visible here, and the solution 
makes sense. It doesn't sacrifice performance as seen in the benchmarks, and 
not introduce any complexity.
   > 
   > However, I'm wondering if this issue could arise in other places as well. 
For example, in Sort streams, one-side collecting joins, large window frames, 
etc. In short, many streams could suffer from the same problem. Rather than 
wrapping each of these individually and spreading this workaround like a virus 
across all pipeline-breaking streams, I think we should address it at the 
source level. If sources yield control periodically, regardless of the 
pipeline, we could solve this issue with a single, centralized fix. For 
example, FileStream could count how many batches it sends back-to-back without 
yielding, and after a certain threshold, it yields. WDYT?
   
   I'm not sure but repartition yield can also be removed maybe if we do such


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to