Hold on Yaron - I think Ivan and I got something working with existing code
- Ivan will post details in a bit

On Mon, Jul 25, 2022 at 3:25 PM Yaron Gvili <rt...@hotmail.com> wrote:

> Yes, I think you mean this post by Weston<
> https://lists.apache.org/thread/llfm5dfh2988w2w4j6off417w9szp1tg>. I'll
> look into adding this sequential-option to source-node and report back.
>
>
> Yaron.
> ________________________________
> From: Li Jin <ice.xell...@gmail.com>
> Sent: Monday, July 25, 2022 11:39 AM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: [C++] Clarifying the behavior of source node and executor
>
> Now I think about it more. Weston has probably answered this in another
> mailing thread that this is not guaranteed and the observation of batches
> becoming out of file reader + source node happened by chance. Perhaps we
> can look into adding an option to Source node to ensure "sequential"..
>
> Li
>
> On Mon, Jul 25, 2022 at 11:18 AM Yaron Gvili <rt...@hotmail.com> wrote:
>
> > I've also been using source node with a generator, but observed batches
> in
> > random order (in a 1-to-2-months old version of Arrow). So, I'd be
> > surprised if ordering is guaranteed, and I'm also interested in how to
> > obtain such a guarantee.
> >
> >
> > Yaron.
> > ________________________________
> > From: Li Jin <ice.xell...@gmail.com>
> > Sent: Monday, July 25, 2022 11:10 AM
> > To: dev@arrow.apache.org <dev@arrow.apache.org>
> > Subject: Re: [C++] Clarifying the behavior of source node and executor
> >
> > Sorry the link to the generator above is wrong - We traced into the code
> > and found it uses BackgroundGenerator:
> >
> >
> https://github.com/apache/arrow/blob/78fb2edd30b602bd54702896fa78d36ec6fefc8c/cpp/src/arrow/util/async_generator.h#L1581
> >
> > On Mon, Jul 25, 2022 at 11:07 AM Li Jin <ice.xell...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Ivan and I are debugging some behavior of the source node this morning
> > and
> > > I was hoping to clarify that our understanding is correct.
> > >
> > > We observed that when using source node with a generator:
> > >
> > >
> >
> https://github.com/apache/arrow/blob/66c66d040bbf81a4819b276aee306625dc02837c/cpp/src/arrow/compute/exec/options.h#L54
> > >
> > > The source node becomes "sequential" (batches come out in order one at
> a
> > > time) even with a GetCpuThreadPool() attached to the plan.
> > >
> > > We traced the code into this class:
> > >
> > >
> >
> https://github.com/apache/arrow/blob/78fb2edd30b602bd54702896fa78d36ec6fefc8c/cpp/src/arrow/util/async_generator.h#L316
> > >
> > > And it seems like because of the synchronization of this class, it
> > > generates batches sequentially. Is this correct understanding and if it
> > is
> > > intentional that the source node are sequential when backed by a
> > > generator? (This is actually the behavior that we want)
> > >
> >
>

Reply via email to