Re: Correct way to collect results from an Acero query

Li Jin Wed, 21 Sep 2022 11:46:51 -0700

Oh thanks Weston I am glad not the only one - I will wait for the PR and
will try to pull that in then.


Thanks,
Li

On Wed, Sep 21, 2022 at 1:54 PM Weston Pace <weston.p...@gmail.com> wrote:

> Funny you should mention this, I just ran into the same problem :).
> We use StartAndCollect so much in our unit tests that there must be
> some usefulness there.  You are correct that it is not an API that can
> be used outside of tests.
>
> I added utility methods DeclarationToTable, DeclarationToBatches, and
> DeclarationToExecBatches to exec_plan.h in [1]. These all take in a
> declaration (that does not have a sink node), add a sink node, create
> an exec plan, and run it.  It might be a bit before [1] merges so if
> you want to pull these out into their own PR that might be useful.
>
> The utility methods capture the common case where a user wants to use
> the default exec context and run the plan immediately.  The main
> downside of these utility methods is that they gather all results in
> memory.  However, if you are dealing with small amounts of data (e.g.
> prototyping, testing) or doing some kind of aggregation then this
> might not be a problem.
>
> We could probably also add a DeclarationToReader method in the future.
>
> [1] https://github.com/apache/arrow/pull/13782
>
> On Wed, Sep 21, 2022 at 8:26 AM Li Jin <ice.xell...@gmail.com> wrote:
> >
> > Hello!
> >
> > I am testing a custom data source node I added to Acero and found myself
> in
> > need of collecting the results from an Acero query into memory.
> >
> > Searching the codebase, I found "StartAndCollect" is what many of the
> tests
> > and benchmarks are using, but I am not sure if that is the public API to
> do
> > so because:
> > (1) the header file arrow/compute/exec/test_util.h depends on gtest,
> which
> > seems to be a test-only dependency
> > (2) the method "StartAndCollect" doesn't return a Result/Status object,
> so
> > errors probably cannot be propagated.
> >
> > Is there a better way / some other public method to achieve this?
> >
> > Thanks,
> > Li
>

Re: Correct way to collect results from an Acero query

Reply via email to