Thank you Weston! On Thu, Oct 13, 2022 at 1:05 AM Weston Pace <weston.p...@gmail.com> wrote:
> 1. Yes. > 2. I was going to say yes but...on closer examination...it appears > that it is not applying backpressure. > > The SinkNode accumulates batches in a queue and applies backpressure. > I thought we were using a sink node since it is the normal "accumulate > batches into a queue" sink. However, the Substrait<->Python > integration is not using a sink node but instead a custom > ConsumingSinkNode (SubstraitSinkConsumer). The SubstraitSinkConsumer > does accumulate batches in a queue (just like the sink node) but it is > not handling backpressure. I've created [1] to track this. > > [1] https://issues.apache.org/jira/browse/ARROW-18025 > > On Wed, Oct 12, 2022 at 9:02 AM Li Jin <ice.xell...@gmail.com> wrote: > > > > Hello! > > > > I have some questions about how "pyarrow.substrait.run_query" works. > > > > Currently run_query returns a record batch reader. Since Acero is a > > push-based model and the reader is pull-based, I'd assume the reader > object > > somehow accumulates the batches that are pushed to it. And I wonder > > > > (1) Does the output batches keep accumulating in the reader object, until > > someone reads from the reader? > > (2) Are there any back pressure mechanisms implemented to prevent OOM if > > data doesn't get pulled from the reader? (Bounded cache in the reader > > object?) > > > > Thanks, > > Li >