zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922200568

   > > For example, FileStream could count how many batches it sends 
back-to-back without yielding, and after a certain threshold, it yields. WDYT?
   > 
   > Perhaps DataSourceExec could wrap the stream returned by the DataSource 
with the YieldStream introduced in this PR so?
   
   > > Thanks @zhuqi-lucas. The problem is clearly visible here, and the 
solution makes sense. It doesn't sacrifice performance as seen in the 
benchmarks, and not introduce any complexity.
   > > However, I'm wondering if this issue could arise in other places as 
well. For example, in Sort streams, one-side collecting joins, large window 
frames, etc. In short, many streams could suffer from the same problem. Rather 
than wrapping each of these individually and spreading this workaround like a 
virus across all pipeline-breaking streams, I think we should address it at the 
source level. If sources yield control periodically, regardless of the 
pipeline, we could solve this issue with a single, centralized fix. For 
example, FileStream could count how many batches it sends back-to-back without 
yielding, and after a certain threshold, it yields. WDYT?
   > 
   > I'm not sure but repartition yield can also be removed maybe if we do such
   
   Thank you @berkaysynnada , @pepijnve , i agree it's a better idea if we can 
add YieldStream to DataSource, i will try to address this good suggestion. 
   
   And repartition yield can also be removed maybe if we do such way, it looks 
like a additional benefit from it, i will investigate also. May be a follow-up 
or i can add in this PR also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to