Hi Antoine,

Thanks for the feedback.

> a CMake entrypoint (for example a function) making it easy for
third-party projects to compile their own functions
I can come up with a minimum CMake template so that users can compile C++
based functions, and I think if the integration happens at the LLVM IR
level, it is possible to author the functions beyond C++ languages, such as
Rust/Zig as long as the compiler can generate LLVM IR (there are other
issues that need to be addressed from the Rust experiment I made, but that
can be another proposal/PR). If we make that work, CMake is probably not so
important either since other languages can use their own build tools such
as Cargo/zig build, and we just need some documentation to describe how it
should be interfaced typically.

> The rest of the proposal (a specific JSON file format, a bunch of functions
to iterate directory entries in a specific layout) is IMHO off-topic for
Gandiva, and each third-party project can implement their own idioms for
the discovery of external functions
Could you give some more guidance on how this should work without an
external function registry containing metadata? As far as I know, for each
pre-compiled function used in an expression, Gandiva needs to lookup its
signature from the function registry, which currently is a C++ class that
is hard coded to contain 6 categories of built-in functions
(arithmetic/datetime/hash/mathops/string/datetime arithmetic). If a third
party function cannot be found in the registry, it cannot be used in the
expression. If we don't load the pre-compiled function metadata from
external files, how do we avoid Gandiva rejecting the expression when a
third party function cannot be found in the function registry? Thanks.

Regards,
Yue

On Mon, Sep 25, 2023 at 10:39 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Hello,
>
> Being making Gandiva more extensible sounds like a worthwhile improvement.
>
> However, I'm not sure why we would need to choose a JSON-based format
> for this. Instead, I think Gandiva could simply provide the two
> following basic-blocks:
>
> 1. a CMake entrypoint (for example a function) making it easy for
> third-party projects to compile their own functions
>
> 2. a C++ entrypoint to load a bitcode file with the corresponding
> function definition(s)
>
> The rest of the proposal (a specific JSON file format, a bunch of
> functions to iterate directory entries in a specific layout) is IMHO
> off-topic for Gandiva, and each third-party project can implement their
> own idioms for the discovery of external functions.
>
> I'd add that this should be documented somewhere so that it is generally
> useful, not only for the contributors of the feature.
>
> Also, I hope that this will get more people interested in Gandiva
> maintenance.
>
> Regards
>
> Antoine.
>
>
> Le 25/09/2023 à 16:17, Yue Ni a écrit :
> > Hi there,
> >
> > I'd like to initiate a discussion regarding the proposal to introduce
> > external function registry support in Gandiva. I've provided a concise
> > description of the proposal in the following issue:
> > https://github.com/apache/arrow/issues/37753. I welcome any feedback or
> > comments on this topic. Please feel free to share your thoughts either
> here
> > on the mailing list or directly within the issue. Thank you for your
> > attention and help.
> >
> > *Background:*
> > Our team has been leveraging Gandiva in our projects, and its performance
> > and capabilities have been commendable. However, we've identified a
> > constraint concerning the registration of functions. At present, Gandiva
> > necessitates that functions be registered directly within its codebase.
> > This method, while functional, is not the most user-friendly and presents
> > hurdles for those aiming to incorporate third-party functions. Direct
> > modifications to Gandiva's source code for such integrations can
> > inadvertently introduce maintenance challenges and potential versioning
> > conflicts down the line.
> >
> > *Proposal:*
> > To address this limitation, I propose the introduction of an external
> > function registry mechanism in Gandiva. This would allow users and
> > developers to register and integrate custom functions without directly
> > modifying Gandiva's core source code. You can find more details in the
> > issue [1] and the PR [2].
> >
> > Any feedback is appreciated. Thanks.
> >
> > *References:*
> > [1] https://github.com/apache/arrow/issues/37753
> > [2] https://github.com/apache/arrow/pull/37787
> >
> > Regards,
> > Yue Ni
> >
>

Reply via email to