Hi folks,

I've been finding myself wanting a high-level overview of what's going on
in the project but find it hard to keep up with all these mailing list
threads, and so have added a section to the maintenance dashboard that
Alenka and I were working on to give a daily summary of the dev mailing
list discussions from the past 90 days.

It can be found at [1] and I'll paste an example of its output (today's
result) below [4].

It's super simple - we grab the dev mailing list, wrangle it into a shape
we can work with, and then get an LLM summary with a prompt we've iterated
on a few times [2].

I'd love some feedback on it - if you also find this useful, what works for
you in the current version and what else would you like to see or see
differently?

Happy for responses here or as issues on the repo [3]

Cheers,

Nic


[1] https://arrow-maintenance.github.io/arrowdash/#overall-summary
[2]
https://github.com/arrow-maintenance/arrowdash/blob/main/ml_data/prompt_ml_summary.txt
[3] https://github.com/arrow-maintenance/arrowdash/issues
[4] Example summary below:

Ongoing Discussions

    ADBC Configuration: Finding a consensus on configuration file locations
and formats for ADBC drivers, with environment variables being a favored
option for ease of use (May 27, June 2).
    C++ CMake Build System: Simplifying the C++ CMake build configuration,
specifically regarding static and shared library linkage, is being explored
(June 3). The discussion includes whether to support building both shared
and static libraries simultaneously.
    nanoarrow Release Process: Preparing for the nanoarrow 0.7.0 release,
including pre-release checklist and finding a release signer (May 30, May
31).
    AWS Credit Usage: Discussing the optimal use of donated AWS credits for
CI improvements, GPU testing, and benchmarking. Large memory tests were
also mentioned (June 12).

Emerging Themes

    Project Component Decoupling: The effort to split language
implementations into separate repositories continues, with Swift being the
latest candidate for separation (May 16, May 19, May 20). This aims to
improve maintainability and allow for language-specific adaptations. C#
decoupling has also been raised (May 23).
    Modernizing C++ Standard: A push to switch the Arrow C++ codebase to
require C++20 is gaining momentum, promising benefits like improved
language features and library support (May 19).
    Community Engagement & Support: Identifying volunteers for the Arrow
Summit selection committee indicates an ongoing need for community
involvement (May 9, May 18, May 19). Kapa.ai’s bot presence on dev docs
indicates potential benefit to community engagement (June 12).
    Legacy Feature Removal: There is a move to deprecate and eventually
remove support for older features like Feather V1 format in C++,
encouraging migration to newer formats (June 3).

Potential Roadblocks

    Swift Implementation Support: Concerns exist about the level of
activity and support for the Swift implementation, even with the proposed
repository split (May 19, May 20).
    Complexity of Build Systems: The discussion around CMake configuration
highlights the complexity of managing build systems and supporting various
build configurations (June 3).
    Maintaining Deprecated Features: Balancing the removal of legacy
features with the potential impact on downstream users and existing
workflows is a consideration, as seen in the Feather V1 deprecation
discussion (June 3).

Strategic Plans

    Release Management: Proposing a feature freeze date for the Arrow
monorepo 21.0.0 release (July 1) indicates a focus on timely and
predictable releases (June 4).
    External Project Integration: A proposal to donate the arrow-gpu
project suggests interest in expanding Arrow’s capabilities in
GPU-accelerated computing (June 6).
    Resource Optimization: Leveraging donated AWS credits to improve CI and
benchmarking infrastructure demonstrates a commitment to optimizing
resources and improving development workflows (June 12).
    Codebase Cleanup: Removing stale Rust issues from the main repository
suggests an ongoing effort to maintain a clean and organized codebase (June
12). Removal of Skyhook from the main repository (June 16) further
demonstrates a desire to reduce the overall footprint of the repository.

Reply via email to