Funny you should mention this, I just ran into the same problem :). We use StartAndCollect so much in our unit tests that there must be some usefulness there. You are correct that it is not an API that can be used outside of tests.
I added utility methods DeclarationToTable, DeclarationToBatches, and DeclarationToExecBatches to exec_plan.h in [1]. These all take in a declaration (that does not have a sink node), add a sink node, create an exec plan, and run it. It might be a bit before [1] merges so if you want to pull these out into their own PR that might be useful. The utility methods capture the common case where a user wants to use the default exec context and run the plan immediately. The main downside of these utility methods is that they gather all results in memory. However, if you are dealing with small amounts of data (e.g. prototyping, testing) or doing some kind of aggregation then this might not be a problem. We could probably also add a DeclarationToReader method in the future. [1] https://github.com/apache/arrow/pull/13782 On Wed, Sep 21, 2022 at 8:26 AM Li Jin <ice.xell...@gmail.com> wrote: > > Hello! > > I am testing a custom data source node I added to Acero and found myself in > need of collecting the results from an Acero query into memory. > > Searching the codebase, I found "StartAndCollect" is what many of the tests > and benchmarks are using, but I am not sure if that is the public API to do > so because: > (1) the header file arrow/compute/exec/test_util.h depends on gtest, which > seems to be a test-only dependency > (2) the method "StartAndCollect" doesn't return a Result/Status object, so > errors probably cannot be propagated. > > Is there a better way / some other public method to achieve this? > > Thanks, > Li