pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922223237

   > yes, it sounds better. It will bring less friction and will be more 
inclusive
   
   Think about this a bit more, it's an interesting design question whose 
responsibility it is to yield. While the cancellation problem is solvable by 
yielding at the producer side (i.e. FileStream (or a wrapper around it)), it's 
more clear/obvious why this is being done (I think) at the consumer side where 
the loop calling `poll_next` is located.
   
   Returning a Pending from the producer is also not a guarantee that 
cancellation will work correctly. As an example, my colleague implemented an 
execution plan that polls a number of input streams in a round robin fashion. 
If a stream returns Pending, rather than returning Pending itself right away it 
will first check if any of the other streams has data available. I don't think 
there's anything inherently wrong with that, but the effect is that any 
intermediate Pending coming from a FileStream would just get gobbled up.
   
   The downside is indeed that every implementation that tries to completely 
drain one or more of its input streams (aggregation, the build side of joins, 
etc) needs to ensure it yields every now and then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to