> I think the key idea is to let users call Gandiva functions to register functions and pass necessary info explicitly to Gandiva, rather than letting Gandiva discover them by itself.
That makes sense. Thanks Jin and Antonie for your valuable feedback. I will revise the proposal accordingly later. Regards, Yue On Wed, Sep 27, 2023 at 4:29 AM Jin Shang <shangjin1...@gmail.com> wrote: > I agree with Antoine that we don't need to define a JSON format or a > directory structure for Gandiva. > To support external functions, we essentially need two things: > 1. Gandiva's function registry needs to be aware of the function metadata: > We can achieve this by having a > `FunctionRegistry::AddFunction(NativeFunction* func)` function. The > `NativeFunction` can come from whatever source the user wants or even hard > coded, not necessarily from JSON files. The function registry is a > singleton so this should be easy. > 2. The LLVM engine needs access to the function IR definition: Users should > be able register a string representation of IR bytecode similar to what the > current `Engine::LoadPreCompiledIR` does. So something like > `LoadExternalIR(std::string_view)` is enough. Although `Engine` is not a > singleton, we can create a global object holding external IRs and Engines > can link them on construction. > I think the key idea is to let users call Gandiva functions to register > functions and pass necessary info explicitly to Gandiva, rather than > letting Gandiva discover them by itself. > > On Tue, Sep 26, 2023 at 2:14 AM Yue Ni <niyue....@gmail.com> wrote: > > > > The definition of an external function registry can certainly belong in > > Gandiva, but how it's populated should be left to third-party projects > > > > Are you proposing a more general approach, like incorporating the > following > > APIs into Gandiva? (Please note that the function names/signatures are > > tentative and just meant for illustrative purposes.) > > 1) AddExternalFunctionRegistry(ExternalFunctionRegistry > function_registry) > > 2) AddFunctionBitcodeLoader(FunctionBitcodeLoader bitcode_loader) > > Where `ExternalFunctionRegistry` can return a list of function > definitions > > and `FunctionBitcodeLoader` can return a list of bitcode buffers, so that > > the specific metadata/bitcode data population logic can be moved out of > > Gandiva? Thanks. > > > > Regards, > > Yue > > > > On Tue, Sep 26, 2023 at 12:25 AM Antoine Pitrou <anto...@python.org> > > wrote: > > > > > > > > Hi Yue, > > > > > > Le 25/09/2023 à 18:15, Yue Ni a écrit : > > > > > > > >> a CMake entrypoint (for example a function) making it easy for > > > > third-party projects to compile their own functions > > > > I can come up with a minimum CMake template so that users can compile > > C++ > > > > based functions, and I think if the integration happens at the LLVM > IR > > > > level, it is possible to author the functions beyond C++ languages, > > such > > > as > > > > Rust/Zig as long as the compiler can generate LLVM IR (there are > other > > > > issues that need to be addressed from the Rust experiment I made, but > > > that > > > > can be another proposal/PR). If we make that work, CMake is probably > > not > > > so > > > > important either since other languages can use their own build tools > > such > > > > as Cargo/zig build, and we just need some documentation to describe > how > > > it > > > > should be interfaced typically. > > > > > > As long as there's a well-known and supported way to generate the code > > > for external functions, then it's fine to me. > > > > > > (also the required signature for these functions should be documented > > > somewhere) > > > > > > >> The rest of the proposal (a specific JSON file format, a bunch of > > > functions > > > > to iterate directory entries in a specific layout) is IMHO off-topic > > for > > > > Gandiva, and each third-party project can implement their own idioms > > for > > > > the discovery of external functions > > > > > > > > Could you give some more guidance on how this should work without an > > > > external function registry containing metadata? As far as I know, for > > > each > > > > pre-compiled function used in an expression, Gandiva needs to lookup > > its > > > > signature from the function registry, which currently is a C++ class > > that > > > > is hard coded to contain 6 categories of built-in functions > > > > (arithmetic/datetime/hash/mathops/string/datetime arithmetic). If a > > third > > > > party function cannot be found in the registry, it cannot be used in > > the > > > > expression. If we don't load the pre-compiled function metadata from > > > > external files, how do we avoid Gandiva rejecting the expression > when a > > > > third party function cannot be found in the function registry? > Thanks. > > > > > > What I'm saying is that code to load function metadata from JSON and > > > walk directories of .bc files does not belong in Gandiva. The > definition > > > of an external function registry can certainly belong in Gandiva, but > > > how it's populated should be left to third-party projects (which then > > > don't have to use JSON or a given directory layout). > > > > > > Regards > > > > > > Antoine. > > > > > >