I don't have much to add, but it is really cool to see all the progress being made here.
On Wed, Sep 29, 2021 at 6:29 AM Neal Richardson <[email protected]> wrote: > Hi all, > I wanted to give an update on work that several of us have been doing in > recent months on query processing in C++. Back in March, we circulated [1] > a document [2] with a proposal on implementing the basic pieces of a query > execution engine. Recent patches have introduced some key aspects of this > functionality, including > > * ExecPlan and ExecNode API > * Scalar and GroupBy aggregation nodes > * OrderBy node > * A faster, asynchronous Dataset scanner implementation > * R bindings > > You may also have noticed a lot of activity around compute functions > (kernels), which these can consume. We roughly doubled the number of > compute functions between 4.0.0 and 5.0.0, and for 6.0.0, we’ve added the > most common aggregation functions and have improved the scalar kernels to > support a more complete set of types. See the compute functions page in the > Arrow C++ library user guide for the full list of compute functions > available in the current release [3]. > > This means we can now scan, project, filter, aggregate, and sort in-memory, > although the functionality is not easily accessible from bindings other > than R just yet. > > In the coming weeks and months, we’ll continue to push ahead on this work, > focusing on > > * More nodes, including joins [4] and top-k/bottom-k [5] > * Alternative scheduling approaches such as work stealing approaches or > different methods of applying back pressure > * The ability to spill to disk when necessary to cut down on memory > pressure > * Connecting the query engine to the experimental Compute IR [6] and > exposing it in Python via ibis > > Thank you to the many members of the Arrow developer community who have > contributed code, comments, and reviews. If you are interested in following > or contributing to this work, please see the “Query engine 6.0 release” [7] > and “Compute kernels 6.0 release” [8] Confluence pages. > > Neal > > [1]: > > https://lists.apache.org/thread.html/rb06b4dc2c6e53fe01784e22e669710710be747faadd46b608c9a27f5%40%3Cdev.arrow.apache.org%3E > [2]: > > https://docs.google.com/document/d/1AyTdLU-RxA-Gsb9EsYnrQrmqPMOYMfPlWwxRi1Is1tQ/edit#heading=h.t89hffc3t7si > [3]: https://arrow.apache.org/docs/cpp/compute.html > [4]: https://github.com/apache/arrow/pull/11150 > [5]: https://issues.apache.org/jira/browse/ARROW-13973 > [6]: https://github.com/apache/arrow/pull/10934 > [7]: > https://cwiki.apache.org/confluence/display/ARROW/Query+engine+6.0+release > [8]: > > https://cwiki.apache.org/confluence/display/ARROW/Compute+kernels+6.0+release >
