Re: Question about pyarrow.substrait.run_query

Li Jin Thu, 13 Oct 2022 08:11:18 -0700

Thank you Weston!

On Thu, Oct 13, 2022 at 1:05 AM Weston Pace <weston.p...@gmail.com> wrote:


> 1. Yes.
> 2. I was going to say yes but...on closer examination...it appears
> that it is not applying backpressure.
>
> The SinkNode accumulates batches in a queue and applies backpressure.
> I thought we were using a sink node since it is the normal "accumulate
> batches into a queue" sink.  However, the Substrait<->Python
> integration is not using a sink node but instead a custom
> ConsumingSinkNode (SubstraitSinkConsumer).  The SubstraitSinkConsumer
> does accumulate batches in a queue (just like the sink node) but it is
> not handling backpressure.  I've created [1] to track this.
>
> [1] https://issues.apache.org/jira/browse/ARROW-18025
>
> On Wed, Oct 12, 2022 at 9:02 AM Li Jin <ice.xell...@gmail.com> wrote:
> >
> > Hello!
> >
> > I have some questions about how "pyarrow.substrait.run_query" works.
> >
> > Currently run_query returns a record batch reader. Since Acero is a
> > push-based model and the reader is pull-based, I'd assume the reader
> object
> > somehow accumulates the batches that are pushed to it. And I wonder
> >
> > (1) Does the output batches keep accumulating in the reader object, until
> > someone reads from the reader?
> > (2) Are there any back pressure mechanisms implemented to prevent OOM if
> > data doesn't get pulled from the reader? (Bounded cache in the reader
> > object?)
> >
> > Thanks,
> > Li
>

Re: Question about pyarrow.substrait.run_query

Reply via email to