That sounds great, Weston! Agreed that syncing up with releases seems like a good idea.
On Mon, Apr 18, 2022 at 11:51 AM Weston Pace <weston.p...@gmail.com> wrote: > I'm happy to provide a quarterly update on C++ engine work but in the > future I'll draft it in PR form so others have a chance to pitch in. > I was inspired by, and hope to mimic, the Rust community's very cool > quarterly roadmap [1][2] as a place to have higher level discussions > on what people are hoping to work on. Since the C++ implementation > has quarterly releases we can probably sync up with releases so I'll > start a discussion about halfway to the 8.0.0 release. > > [1] > https://docs.google.com/document/d/1t64vZwZnXm9MyFj2qz3xcAkSxK3Wu12giS3KrS4nDE0/edit > [2] https://github.com/apache/arrow-datafusion/pull/2133 > > On Mon, Apr 18, 2022 at 7:20 AM Will Jones <will.jones...@gmail.com> > wrote: > > > > Thanks Weston for providing the update on the C++ compute engine. IMO, it > > would be very welcome to have that update be a quarterly email to the dev > > mailing list, and may provide an opportunity to highlight issues in Jira > > that are good first issues or neglected but important. > > > > On Wed, Apr 13, 2022 at 10:00 AM David Li <lidav...@apache.org> wrote: > > > > > Attendees: > > > > > > - David Li > > > - Eduardo Ponce > > > - Gavin Ray > > > - Ian Cook > > > - James Duong > > > - Matthew Topol > > > - Nic > > > - Niranda > > > - Raul Cumplido > > > - Rok > > > - Weston Pace > > > - Will Jones > > > > > > N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will > > > not be able to host the fortnightly sync call. Is anyone available to > run > > > the meeting that day? > > > > > > Agenda: > > > > > > 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the > > > next ~1-2 weeks. See the ML post [1] for details, including a wiki page > > > listing outstanding issues. In particular, there are some Go PRs that > could > > > use attention from an interested Go developer [2], as well as some > temporal > > > kernel PRs that could use a review [3]. > > > > > > Arrow C++ Compute Engine: Weston gave a status update; > APIs/documentation > > > has been improved for users, though likely most will use it through an > API > > > like Substrait; basic Substrait support has been added with forthcoming > > > improvements; more tooling to measure performance is being worked on; > > > general kernel execution overhead is being addressed with an eye > towards > > > running smaller batches through the engine. An asof join > implementation is > > > being worked on, and Go is working towards Substrait bindings to be > able to > > > bind to the C++ engine. > > > > > > Kernel vectorization/SIMD: Eduardo has been looking at making some of > the > > > primitive kernels (e.g. arithmetic) more easily autovectorized by the > > > compiler, testing a variety of approaches. See related discussion [4]. > We > > > do not have benchmarks to evaluate compiler performance in this regard > > > generally, but we have manually inspected some compiler output and > found > > > that not all compilers manage to do this with the current kernel > > > implementations. We also don't have a holistic way to evaluate this > going > > > forward, nor do we have a sense for current benchmark coverage, though > > > possibly we could generate benchmarks. However, it was pointed out that > > > general engine performance is likely more important, and that current > > > profiling indicates kernels are not yet a bottleneck, though there may > be > > > low-hanging fruit here. > > > > > > Flight/Flight SQL: we discussed the barriers to Flight SQL support in > Go; > > > Flight SQL heavily uses union types which are not yet implemented. A > > > further proposal [5] has been submitted to extend the type metadata, > please > > > take a look for those interested. The GetXdbcTypeInfo proposal was > merged, > > > and the inline data proposal is still outstanding (but probably ready > to > > > have a vote). > > > > > > IPC/Format: it was asked if there's an IPC structure for serializing a > > > single array to reduce overhead. Current APIs likely suffice but > Niranda > > > may submit a separate discussion to explain further. > > > > > > [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn > > > [2]: https://github.com/apache/arrow/pull/12158 > > > [3]: https://github.com/apache/arrow/pull/12657 > > > [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489 > > > [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6 > > > > > > On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote: > > > > Hi all, > > > > > > > > Our biweekly sync call is tomorrow at 12:00 noon Eastern time. > > > > > > > > The Zoom meeting URL for this and other biweekly Arrow sync calls is: > > > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 > > > > > > > > Alternatively, enter this information into the Zoom website or app to > > > > join the call: > > > > Meeting ID: 876 4903 3008 > > > > Passcode: 958092 > > > > > > > > Thanks, > > > > Ian > > > >