Hi,

I came to a limitation that I would like to propose a resolution to.

TL;DR; currently, users plan UDFs calls via a call of the form

let e = scalar_functions(“my_udf”, vec![col(“a”)],DataType::Float64)]);
df.select(vec![e])

The proposal is to use instead:

let f = df.registry();

let e = f.udf(“my_udf”, vec![col(“a”)])?;

# note: no DataType::Float64

df.select(vec![e])

so that users do not have to know the return type of the udf they are using
(they still need to set it during registration). This will make our lives
easier, and will also enable our own UDFs (e.g. sqrt) to support variable
types (e.g. float32 and float64). This will be important for functions that
return composite objects, such as array(), whose return type heavily
depends on its input type.

Proposal:
https://docs.google.com/document/d/1Kzz642ScizeKXmVE1bBlbLvR663BKQaGqVIyy9cAscY/edit?usp=sharing

Issue: https://issues.apache.org/jira/browse/ARROW-9836
PR: https://github.com/apache/arrow/pull/8032

Best,
Jorge

Reply via email to