That sounds great, Weston! Agreed that syncing up with releases seems like
a good idea.

On Mon, Apr 18, 2022 at 11:51 AM Weston Pace <weston.p...@gmail.com> wrote:

> I'm happy to provide a quarterly update on C++ engine work but in the
> future I'll draft it in PR form so others have a chance to pitch in.
> I was inspired by, and hope to mimic, the Rust community's very cool
> quarterly roadmap [1][2] as a place to have higher level discussions
> on what people are hoping to work on.  Since the C++ implementation
> has quarterly releases we can probably sync up with releases so I'll
> start a discussion about halfway to the 8.0.0 release.
>
> [1]
> https://docs.google.com/document/d/1t64vZwZnXm9MyFj2qz3xcAkSxK3Wu12giS3KrS4nDE0/edit
> [2] https://github.com/apache/arrow-datafusion/pull/2133
>
> On Mon, Apr 18, 2022 at 7:20 AM Will Jones <will.jones...@gmail.com>
> wrote:
> >
> > Thanks Weston for providing the update on the C++ compute engine. IMO, it
> > would be very welcome to have that update be a quarterly email to the dev
> > mailing list, and may provide an opportunity to highlight issues in Jira
> > that are good first issues or neglected but important.
> >
> > On Wed, Apr 13, 2022 at 10:00 AM David Li <lidav...@apache.org> wrote:
> >
> > > Attendees:
> > >
> > > - David Li
> > > - Eduardo Ponce
> > > - Gavin Ray
> > > - Ian Cook
> > > - James Duong
> > > - Matthew Topol
> > > - Nic
> > > - Niranda
> > > - Raul Cumplido
> > > - Rok
> > > - Weston Pace
> > > - Will Jones
> > >
> > > N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> > > not be able to host the fortnightly sync call. Is anyone available to
> run
> > > the meeting that day?
> > >
> > > Agenda:
> > >
> > > 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
> > > next ~1-2 weeks. See the ML post [1] for details, including a wiki page
> > > listing outstanding issues. In particular, there are some Go PRs that
> could
> > > use attention from an interested Go developer [2], as well as some
> temporal
> > > kernel PRs that could use a review [3].
> > >
> > > Arrow C++ Compute Engine: Weston gave a status update;
> APIs/documentation
> > > has been improved for users, though likely most will use it through an
> API
> > > like Substrait; basic Substrait support has been added with forthcoming
> > > improvements; more tooling to measure performance is being worked on;
> > > general kernel execution overhead is being addressed with an eye
> towards
> > > running smaller batches through the engine. An asof join
> implementation is
> > > being worked on, and Go is working towards Substrait bindings to be
> able to
> > > bind to the C++ engine.
> > >
> > > Kernel vectorization/SIMD: Eduardo has been looking at making some of
> the
> > > primitive kernels (e.g. arithmetic) more easily autovectorized by the
> > > compiler, testing a variety of approaches. See related discussion [4].
> We
> > > do not have benchmarks to evaluate compiler performance in this regard
> > > generally, but we have manually inspected some compiler output and
> found
> > > that not all compilers manage to do this with the current kernel
> > > implementations. We also don't have a holistic way to evaluate this
> going
> > > forward, nor do we have a sense for current benchmark coverage, though
> > > possibly we could generate benchmarks. However, it was pointed out that
> > > general engine performance is likely more important, and that current
> > > profiling indicates kernels are not yet a bottleneck, though there may
> be
> > > low-hanging fruit here.
> > >
> > > Flight/Flight SQL: we discussed the barriers to Flight SQL support in
> Go;
> > > Flight SQL heavily uses union types which are not yet implemented. A
> > > further proposal [5] has been submitted to extend the type metadata,
> please
> > > take a look for those interested. The GetXdbcTypeInfo proposal was
> merged,
> > > and the inline data proposal is still outstanding (but probably ready
> to
> > > have a vote).
> > >
> > > IPC/Format: it was asked if there's an IPC structure for serializing a
> > > single array to reduce overhead. Current APIs likely suffice but
> Niranda
> > > may submit a separate discussion to explain further.
> > >
> > > [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> > > [2]: https://github.com/apache/arrow/pull/12158
> > > [3]: https://github.com/apache/arrow/pull/12657
> > > [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> > > [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
> > >
> > > On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> > > > Hi all,
> > > >
> > > > Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
> > > >
> > > > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > > >
> > > > Alternatively, enter this information into the Zoom website or app to
> > > > join the call:
> > > > Meeting ID: 876 4903 3008
> > > > Passcode: 958092
> > > >
> > > > Thanks,
> > > > Ian
> > >
>

Reply via email to