Question about pyarrow.substrait.run_query

Li Jin Wed, 12 Oct 2022 09:02:10 -0700

Hello!

I have some questions about how "pyarrow.substrait.run_query" works.


Currently run_query returns a record batch reader. Since Acero is a
push-based model and the reader is pull-based, I'd assume the reader object
somehow accumulates the batches that are pushed to it. And I wonder

(1) Does the output batches keep accumulating in the reader object, until
someone reads from the reader?
(2) Are there any back pressure mechanisms implemented to prevent OOM if
data doesn't get pulled from the reader? (Bounded cache in the reader
object?)

Thanks,
Li

Question about pyarrow.substrait.run_query

Reply via email to