berkaysynnada commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922064218
> Thanks @zhuqi-lucas. The problem is clearly visible here, and the solution makes sense. It doesn't sacrifice performance as seen in the benchmarks, and not introduce any complexity. > > However, I'm wondering if this issue could arise in other places as well. For example, in Sort streams, one-side collecting joins, large window frames, etc. In short, many streams could suffer from the same problem. Rather than wrapping each of these individually and spreading this workaround like a virus across all pipeline-breaking streams, I think we should address it at the source level. If sources yield control periodically, regardless of the pipeline, we could solve this issue with a single, centralized fix. For example, FileStream could count how many batches it sends back-to-back without yielding, and after a certain threshold, it yields. WDYT? I'm not sure but repartition yield can also be removed maybe if we do such -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org